Support Vector Machines operate using the modelling framework to try to linearly separate data points. Suppose we have some training data. This utilises the kernel trick to change the topology of the feature space of our data whilst still keeping the computation relatively simple. Let represent such a kernel. Then we solve the following optimisation problem
such that
Which we turn this into a classifier by setting:
b^s = y^s - \sum_{t \in T} \alpha_t y^t K(x^t, x^s), \mbox{ for any } s \in T \mbox{ such that } \alpha^s \not = 0.