SVM-Helper: Task Management Software for LibSVM

by Robert Zaremba


Date Submitted: Oct 2006

Problem and Motivation

The goal of this project is to develop software that can aid in the use of LibSVM[1], a popular open-source implementation of Support Vector Machines. The standard method of interaction with this program is through a command line interface which allows for custom settings to be passed in as arguments. This command line interface can be difficult to use due to the large number of custom settings which are available, but is sufficient for typical use. With the wide availability of multi-processor machines today it is desirable to have multiple instances of the program running at the same time due to the fact that a LibSVM task may take hours to complete and the nature of the program requires that each task run sequentially. The management of multiple command line windows can be inconvenient and so a GUI allows for the management of multiple tasks running concurrently may be desirable. Additionally, the end users of data mining software are often uninitiated in the science of knowledge discovery[3]; a user-friendly GUI which hides complexity can be an asset to such users.

Background and Related Work

There exists a variety of software to aid in knowledge discovery via a GUI. One of the most prominent of these is the Waikato Environment for Knowledge Analysis, commonly referred to as Weka, which provides a variety of machine learning, preprocessing, and visualization tools[5]. While Weka provides a notable number of features there are some utilities which are included in the LibSVM package which are unavailable in Weka, such as the grid search utility which uses heuristics to determine the best parameters for a specific problem[1]. The fact that two different file formats are used makes it difficult to switch between the utilities. A second advantage to having a data mining utility which is specifically tailored for a particular algorithm is the ability to pipeline the output from one computation to the input of another computation, e.g. scheduling a model to be trained and then tested once the training has completed.

Approach and Uniqueness

The design of the software under consideration will make use of the Model-View-Controller (MVC) paradigm, which is commonly present in GUI applications. MVC is a composite design pattern in which the interface is represented by a View object which displays the current state of a Model object, with user input being handled by a Controller object[4]. This separation of responsibilities into distinct modules helps to keep objects loosely coupled, allowing for one module to be altered with other modules being minimally effected. The use of interfaces, with concrete objects only instantiated via a method implementing the Factory design pattern, will also help to minimize class dependencies[2].

The actual implementation of the software will be done using Microsoft's .Net Framework, with C# being the primary language used. Java is the customary choice for software of this nature but the author's comfort level is much higher with C#, and familiarity with certain aspects of .Net such as multi-threading are an asset to this project. In order to allow the LibSVM portion of the program to be easily updatable it shall be compiled into a separate DLL file. The source code shall be downloaded from the website of the LibSVM authors in accordance with their copyright and only altered to accommodate the passing of result messages. Due to the fact that some utilities in the LibSVM package are implemented in the Python programming language, software to allow for Python code to run within the .Net Framework may also be used.

Results and Contributions

The finished product is to be made freely available in both source and binary forms. Ideally the software will be useful to users of LibSVM and will find a place in knowledge discovery research. If the program does prove to be useful I intend to maintain the software and to investigate ways to extend its functionality. Extensibility is an important consideration in this project and is the primary motivation for keeping class dependencies to an absolute minimum.

References

  1. Chang, C.-C., and Lin, C.-J., LIBSVM : a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm, 2001.

  2. Gamma, E., Helm, R., Johnson, R., and Vlissides, J.. Design Patterns: Elements of Reusable Design. Addison-Wesley, 1995.

  3. Goebel, M. and Gruenwald, L.. “A Survey of Data Mining and Knowledge Discovery Tools”. In SIGKDD Explorations. June 1999.

  4. Riehle, D.. “Composite Design Patterns”. In Proceedings of the 12th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications OOPSLA '97. GA, Oct 1997.

  5. Witten, I. and Frank, E.. Data Mining: Practical Machine Learning Tools and Techniques, 2nd Edition. Morgan Kaufmann, San Francisco, 2005.