Recall of the Slack SVM dual problem:
Dual Problem
Suppose that we have solved the dual problem and get the dual optimum. Let represent the support set related with ; represent the support set related with . Meanwhile, we define and . Then we can compute the primal optimum:
Given a new point , we can perform classification by computing:
According to the formulas above, we notice that in the dual problem, computation of and classification of new points, always appears as a whole.
SVM with kernel functions
Mapping points to a higher dimensional space
In some cases, if the points is not linearly separable in current space, they are possibly linearly separable if we map them into the higher dimension.
We define as a mapping function which maps low dimensional data to a high dimensional data. We can first map our data , then solve the dual problem:
We notice that in the dual problem, computing and performing classification, always appears as a whole. Therefore, we can avoid computing the exact form of , but instead directly explore the function for the inner product of two mapped points :
We call as the kernel function.
What is a valid kernel function?
A kernel function is valid if there exists a mapping function , such that it holds for any .
Moreover, there is an equivalent conclusion on the validness of a kernel function.
A kernel function is valid if for any samples , the kernel matrix is nonnegative definite.
Examples of Kernel functions

Polynomial kernel function
It can be proven that this function is equivalent to first mapping points to higher dimensional space and then computing the inner product.

Gaussian Kernel
Applying Gaussian kernel is equivalent to first mapping points to a infinitely high dimensional space and then computing the inner product. This can be understood by the Taylor expansion of the exponential function. For detailed explanation please see SVM中，高斯核为什么会把原始维度映射到无穷多维？
Dual problem with kernel function
With the definition of the kernel function, we can rewrite the dual problem and classification task as following.
Dual Problem
Suppose that we have solved the dual problem and get the dual optimum. Let represent the support set related with ; represent the support set related with . Meanwhile, we define and . Then we can compute the primal optimum:
Given a new point , we can perform classification by computing:
See, in fact is never really computed, since we are only interested in the kernel function!
Solve the dual problem using Gradient Descent Algorithm
We can solve the dual problem using gradient descent algorithm as introduced in the post An Introduction to Support Vector Machines (SVM): Dual problem solution using GD. Just simply select a kernel function, such as polynomial or Gaussian, compte the Kernel matrix for the training dataset, compute the gradient and then perform back propagation to get the dual optimum . After getting , we can compute the primal optimum and perform classification on new points using the equations above.
In the next post, I will introduce how to solve the dual problem using Sequential Minimal Optimization (SMO).

Previous
An Introduction to Support Vector Machines (SVM): SVM with slack variables 
Next
An Introduction to Support Vector Machines (SVM): Sequential Minimal Optimization (SMO)