ONLINE RESOURCES

http://www.research.microsoft.com/users/jplatt/svm.html

Support Vector Machines

Support Vector Machines were invented by Vladimir Vapnik. They are a method for creating functions from a set of labeled training data. The function can be a classification function (the output is binary: is the input in a category) or the function can be a general regression function.

For classification, SVMs operate by finding a hypersurface in the space of possible inputs. This hypersurface will attempt to split the positive examples from the negative examples. The split will be chosen to have the largest distance from the hypersurface to the nearest of the positive and negative examples. Intuitively, this makes the classification correct for testing data that is near, but not identical to the training data. More information can be found in Burges' tutorial or in Vapnik's book (see below).

There are various ways to train SVMs. One particularly simple and fast method is Sequential Minimal Optimization.

The output of an SVM is an uncalibrated value, not a posterior probability of a class given an input. However, I have recently created an algorithm to map SVM outputs into posterior probabilities. This algorithm is described in

J. Platt, Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods (84K gzipped PS file), Advances in Large Margin Classifiers, A. Smola, P. Bartlett, B. Scholkopf, D. Schuurmans, eds., MIT Press, (1999), to appear.

Training an SVM on a large data set with many classes can be slow. Along with Nello Cristianini and John Shawe-Taylor, I created a training algorithm, called a DAGSVM. The DAGSVM trains in an amount of time independent of the number of classes, and evaluates in time that is linear in the number of classes. The training of a DAGSVM is shown to minimize a probabilistic bound on the test error. The DAGSVM is described in

J. Platt, N. Cristianini, J. Shawe-Taylor, Large Margin DAGs for Multiclass Classification (95 K PS file), in Advances in Neural Information Processing Systems 12, pp. 547-553, MIT Press, (2000). Also available as a 84 K pdf file.

References

Cristianini, N., Shawe-Taylor, J., An Introduction to Support Vector Machines, Cambridge University Press, (2000).
Schölkopf, S., Burges, C. J. C., Smola, A. J., Advances in Kernel Methods: Support Vector Learning, MIT Press, Cambridge, MA, (1999).
Vapnik, V. Statistical Learning Theory. Wiley-Interscience, New York, (1998).
Publication list at kernel-machines.org
References in An Introduction to Support Vector Machines.

Web Sites

This page was written by John Platt of the CCSP Group at Microsoft Research. Last updated: 08/07/01.