Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Pages

Posts

EM Algorithm and Gaussian Mixture Model for Clustering

13 minute read

Published:

In the last post on EM algorithm, we introduced the deduction of the EM algorithm and use it to solve the MLE of the heads probability of two coins. In this post, we will apply EM algorithm to more practical and useful problem, the Gaussian Mixture Model (GMM), and discuss about using GMM for clustering.

An Introduction to Expectation-Maximization (EM) Algorithm

15 minute read

Published:

Expectation Maximization (EM) algorithm is a special case of MLE where the observations (data samples \(\mathbf{x}\)) are inherently related with some hidden variables (\(\mathbf{z}\)). First of all, we need to review the basics of MLE.

Maximum Likelihood Estimation (MLE)

Let \(\{\mathbf{x}^{(i)}\},\ i=1,\dots,n\) be a set of independent and identically distributed observations, and \(\mathbf{\theta}\) be the parameters of the data distribution which are unknown for us. The maximum likelihood estimation of the parameters \(\theta\) is the parameters which can maximize the joint distribution \(p_\theta(\mathbf{x}^{(1)},\dots,\mathbf{x}^{(n)})= \prod_{i=1}^{n}p_\theta(\mathbf{x}^{(i)})\)

$$ \hat{\theta}_\text{MLE} = \arg\ \max_{\theta}{\prod_{i=1}^{n}p_\theta(\mathbf{x}^{(i)})} $$

More commonly, we choose to maximize the joint log-likelihood:

$$ \hat{\theta}_\text{MLE} = \arg\ \max_{\theta}{\sum_{i=1}^{n}\log p_\theta(\mathbf{x}^{(i)})} $$

An Introduction to Support Vector Machines (SVM): Dual problem solution using Gradient Descent

4 minute read

Published:

Recall of the SVM primal problem and dual problem:
Primal Problem

$$ \min_{\mathbf{w},b}\ \frac{1}{2}\|\mathbf{w}\|^2\\ \begin{align} s.t.\ \ & y_i(\mathbf{w}^T\mathbf{x}_i+b)\geq 1,\ i=1,\dots,n \end{align} $$

Dual Problem

$$ \max_{\lambda}\ \sum_{i=1}^{n}\lambda_i - \frac{1}{2}\sum_{i,j}\lambda_i \lambda_j y_i y_j \mathbf{x}_i^T\mathbf{x}_j\\ \begin{align} s.t.\ \ & \lambda_i \geq 0,\ i=1,\dots,n\\ & \sum_{i=1}^{n}\lambda_i y_i = 0 \end{align} $$

The the last post we introduced how to apply Lagrangian duality to SVM and how to get the primal optimum once we get the dual optimum. In this post we mainly discuss how to solve the dual problem and get the dual optimum.

An Introduction to Support Vector Machines (SVM): Gradient Descent Solution

5 minute read

Published:

In the last post, we discussed that the SVM optimization problem is:

$$\text{min}\frac{1}{2}\|\mathbf{w}\|^2,\ \ \text{s.t.}\ \ \ y_i(\mathbf{w}^T\mathbf{x}_i+b)\geq 1, \ \ i=1,\dots,n$$

To solve this optimization problem, there are multiple ways. One way is to treat this problem as a standard optimization problem and use gradient descent algorithm to compute the optimal parameters. Another way is to formulate the Lagrangian dual problem of the primal problem, transferring original optimization problem into an easier problem. Here we mainly discuss the first method.

An Introduction to Support Vector Machines (SVM): Basics

4 minute read

Published:

Support Vector Machine (SVM) is a method for classification (and possibly for regression). Here we mainly discuss the most common application: binary classification problem.

Given a training dataset with binary classes {+1, -1}, SVM means to find a separating hyperplane which can maximize the margin.

portfolio

publications

talks

teaching