Support Vector Machine

Introduction

A Support Vector Machine (SVM) is a supervised learning model with associated learning algorithms that analyze data for classification and regression analysis. SVMs are a set of related supervised learning methods used for classification and regression. Given a set of training examples, each marked as belonging to one of two categories, an SVM training algorithm builds a model that assigns new examples into one category or the other, making it a non-probabilistic binary linear classifier.

History

The Support Vector Machine was first introduced in the 1960s and was later refined in 1995 by Corinna Cortes and Vladimir Vapnik. The current standard incarnation (soft margin) was proposed by Cortes and Vapnik in 1995 and is widely used in the field of Machine Learning.

A computer screen displaying a support vector machine model.

Theory

The goal of a support vector machine is to find the optimal separating hyperplane which maximizes the margin of the training data. The decision function is fully specified by a (usually very small) subset of training samples, the support vectors. They are the data points that lie closest to the decision surface (or hyperplane).

Mathematical formulation

The SVMs are typically implemented by solving a quadratic programming problem with linear constraints, which is a special type of optimization problem. The solution involves constructing a dual problem and applying the method of Lagrange multipliers, which is a way to find the local maxima and minima of a function subject to equality constraints.

Kernel trick

The Kernel trick is used to transform the input data into a higher dimensional space to make it possible to perform the linear separation. This is particularly useful when dealing with non-linearly separable data.

Types of SVM

There are several types of Support Vector Machines, including:

1. Linear SVM: A SVM is called linear SVM if the decision boundary is linear. Linear SVM is the simplest form of SVM.

2. Non-linear SVM: A SVM is called non-linear SVM if the decision boundary is non-linear. Non-linear SVM is used for non-linearly separated data, which means if data cannot be classified by a linear decision boundary then we use a non-linear decision boundary.

3. One-class SVM: One-class SVM is used for novelty detection, that is, to find out whether a new observation is an outlier or similar to the training data.

Applications

Support Vector Machines are used in a variety of applications, including:

- Image classification - Text categorization - Handwriting recognition - Bioinformatics (Protein classification, Cancer classification) - Generalized predictive control

Advantages and Disadvantages

Advantages of Support Vector Machines include:

- Effective in high dimensional spaces - Uses a subset of training points in the decision function (support vectors), so it is also memory efficient - Versatile: different Kernel functions can be specified for the decision function

Disadvantages of Support Vector Machines include:

- If the number of features is much greater than the number of samples, the method is likely to give poor performances - SVMs do not directly provide probability estimates, these are calculated using an expensive five-fold cross-validation