Support vector classifier: Requires linear decision boundary
Support vector Machine: Requires knowledge of the decision boundary
Hyperplane
Hyperplane
A hyperplane in an -dimensional vector space over a field a is an dimensional sub plane. This is given by linearly independent vectors and a point . Then the space is made of
Note different vectors could define the same hyperplane.
Another way to define a hyperplane is to use an equation. This is to take an orthogonal vector to the subspace spanned by perform the Dot product with that vector and set it equal to the dot product of . Intuitively this is the “distance” away from the hyperplane that point is.
This means a hyperplane cuts into two halves.
This provides a nice starting point for classifiers.
Starting to classify
Linearly separable
Linearly separable
Two sets of points are linearly separable if there is some hyperplane defined for :
Such that and .
Suppose we have points of training data with classification labels . This will be represented by for , where and . Lets assume the data is linearly separable so there exists such that
There are infinite choices of some of these represent the same solution. You can get rid of these by forcing
Intuitively this forces the orthogonal vector to be unit. Though there are still two choices of vector to point out of the “top” or “bottom” though this is forced by the classification within our data.
This now means that now define a unique classification. However in most cases there are still infinitely many choices of .
Best line
The magnitude of
(note I replaced with to make life easier) relates to how far the point is away from the hyperplane. We call this the margin for that point and ideally we would want to maximise the minimum margin. This will make the least assumptions about our data and reduce the chances of overfitting.
A hyperplane that achieves this is called a maximal margin classifier. I.e. it is looking for the max such that
This is an interesting requirement, as most of our data does not really matter for this. There will be key points of our data that will act as the limits on our data. These will be called support vectors.
(Note these support vectors and classifier will be unique!)
Constructing the maximal margin classifier
Lets now collect all our requirements for the maximal margin classifier. We want to choose such that
This is essentially the same as a neural network at this point!
Perceptron (neural network)
Perceptron (neural network)
A perceptron is a function defined by two steps - first a weighted linear sum, the second an activation function. To define this we need weights for and an activation function . Where