Recall the Kernel SVM dual problem:
Dual Problem
We have introduced using gradient descent algorithm to solve the dual problem. However, the computation of the gradient has a high time complexity and thus would be a challenge for memory, especially when the training dataset is large. In this post, I introduce an efficient and lightversion algorithm to solve the dual problem: Sequential Minimal Optimization (SMO)
Sequential Minimal Optimization (SMO)
The algorithm of SMO is:
Initialization: let be a set which satisfies the dual constraint.
Repeat:
(1) heuristically select two , and set all the other fixed;
(2) optimize with respect to ;
Until: KKT condition is satisfied with certain accuracy.
First question about the initialization: how to find a set which satisfies the dual constraints?
The answer is simply set for .
Suppose that we have finished the initialization, and pick up a pair to optimize while keeping fixed, then we have
Moreover, according to the dual constraints, we have
So we have
is concave with respect to , since due to the fact that the kernel matrix is nonnegative definite (see last post An Introduction to Support Vector Machines (SVM): kernel functions ). Therefore, we can find the optimal value of which maximizes by computing the gradient and set it to 0.
By solving this equation, we will get the solution for :
It is too complicated to compute the numerator since there are too many terms. In the next, we will show that we can actually compute from the old .
Before updating the value of , we first use the old version to perform the classification on data :
Base on the expressions of , we can have the following equation:
We denote prediction error , then we have the expression of :
Discussion: What if ? In this case is a first degree function, it’s still concave, but in this case the definition of is no longer meaningful, so we just simply select another pair and do the computation above.
Note that the expression of the is not clipped, so for simplicity we name it as . It is inadequate to only compute the . We need to further clip it based on the meaningful domain determined by the dual constraints. According to the dual constraints, each actually has a box constraint. So we have:
We know that . Based on whether or not, we can have the relationship between and with box constraints, shown in the figure below.
According to the figure, we can get the lower bound and higher bound for a meaningful solution of a new :
 if :
$$ L = \max(\xi y_b, 0)\\ H = \min(C+\xi y_b, C ) $$  if :
$$ L = \max(0, C\xi y_b)\\ H = \min(C, \xi y_b) $$ Based on and , we can get the clipped new :
This is the final meaningful new value of . For simplicity, in the following we use to refer .
After getting , we need to compute :
Now, we need to decide whether to update the value of . If , then is the support vector which is exactly located at the margin. Therefore, we can update as:
Otherwise, if , we can update as:
Note that if neither nor , here we choose not to update .
Now, we have finished one single iteration in SMO.
Before we summarize the algorithm of SMO, there are some updates that can improve the computation efficiency.
 Computation of :
In the deduction above, we can see is used in computing and . If we compute using , it will be time consuming. Instead, we can use the equation
$$ \xi = \lambda_a^\text{old} y_a  \lambda_b^\text{old} y_b $$ By substituting the expression of into the expression of , we have:
Sequential Minimal Optimization Algorithm
According to the deduction above, we can have the pseudo algorithm of the SMO.
Initialization: for , , and precalculation of the Kernel matrix
Repeat:
heuristically (or randomly) select a pair ;
if :
continue
if :
else:
if :
else if :
else:
if :
else if :
Until: Maximum iteration reached, or the dual objective function is not further maximized with a certain accuracy.
Cool, isn’t it? Now We are able to solve the dual problem using the SMO algorithm!
Ref:

Previous
An Introduction to Support Vector Machines (SVM): kernel functions 
Next
An Introduction to Support Vector Machines (SVM): A Python Implementation