Support Vector Machine

Find the maximum-margin hyperplane that separates two classes, with support vectors defining the boundary.

Classification
Phase 1f(x)

The Mathematics

Maximum margin hyperplane, support vectors, and the primal objective

Decision Function

SVM classifies a point by the sign of its distance from the decision hyperplane wx+b=0\mathbf{w}^\top x + b = 0:

y^(x)=sign(wx+b)\hat{y}(x) = \text{sign}(\mathbf{w}^\top x + b)

Points with wx+b>0\mathbf{w}^\top x + b > 0 are classified as class +1, and points with a negative value as class −1.

Maximum Margin

The geometric margin — the perpendicular distance between the two parallel margin hyperplanes — is:

γ=2w\gamma = \frac{2}{\|\mathbf{w}\|}

SVM finds the maximum-margin hyperplane: the one that maximizes γ\gamma, pushing the two class clouds as far apart as possible.

Primal Objective

Maximizing the margin is equivalent to minimizing w2\|\mathbf{w}\|^2 subject to all points being correctly classified outside the margin:

minw,b  12w2s.t.  yi(wxi+b)1  i\min_{\mathbf{w},b}\; \frac{1}{2}\|\mathbf{w}\|^2 \quad \text{s.t.}\; y_i(\mathbf{w}^\top x_i + b) \geq 1 \;\forall i

The soft-margin variant adds slack variables ξi0\xi_i \geq 0 and a penalty CiξiC\sum_i \xi_i to tolerate some misclassifications.

Support Vectors

The solution depends only on the points lying on the margin boundaries (support vectors). The weight vector is their weighted sum:

w=iSVαiyixi\mathbf{w} = \sum_{i \in SV} \alpha_i y_i x_i

Kernel Trick

Replace xixjx_i^\top x_j with a kernel K(xi,xj)=ϕ(xi)ϕ(xj)K(x_i, x_j) = \phi(x_i)^\top\phi(x_j) to implicitly operate in a high-dimensional feature space — enabling nonlinear decision boundaries without ever computing ϕ(x)\phi(x) explicitly.

Phase 2

See It Work

Watch the maximum-margin hyperplane converge

Data & Decision Boundary

x₁x₂0.01.63.24.86.48.00.01.63.24.86.48.0

Current Parameters

w₁0.000
w₂0.000
b0.000
margin γ0.000

Phase

init

Class −1Class +1Decision boundarySupport vector
w=(0,0),  b=0\mathbf{w}=(0,0),\; b=0
Step 1 of 10

Initialize: w=(0,0), b=0. No decision boundary yet.

Phase 3</>

The Code

Bridge from mathematical formulation to Python implementation

Mathematical Formulation

w=0,  b=0\mathbf{w}=\mathbf{0},\;b=0

Initialize weight vector and bias to zero

yi(wxi+b)<1y_i(\mathbf{w}^\top x_i+b)<1

Check if point is within or violates the margin

w+=ηCyixiηw\mathbf{w}\mathrel{+}=\eta Cy_ix_i-\eta\mathbf{w}

Update weights for misclassified / margin-violation point

w=ηw\mathbf{w}\mathrel{-}=\eta\mathbf{w}

Regularization step for correctly classified points

y^=sign(wx+b)\hat{y}=\text{sign}(\mathbf{w}^\top x+b)

Predict class label using sign of decision function

γ=2w\gamma=\frac{2}{\|\mathbf{w}\|}

Compute margin width from weight vector norm

{xi:yi(wxi+b)1<ϵ}\{x_i:|y_i(\mathbf{w}^\top x_i+b)-1|<\epsilon\}

Identify support vectors near the margin boundary

Python Implementation

def svm_train(X, y, C=1.0, lr=0.01, epochs=1000):
    n, d = X.shape
    w = np.zeros(d)
    b = 0.0
    for epoch in range(epochs):
        for i in range(n):
            margin = y[i] * (X[i] @ w + b)
            if margin < 1:
                w += lr * (C * y[i] * X[i] - w)
                b += lr * C * y[i]
            else:
                w -= lr * w
    return w, b

def predict(X, w, b):
    return np.sign(X @ w + b)

def margin_width(w):
    return 2 / np.linalg.norm(w)

def support_vectors(X, y, w, b):
    margins = y * (X @ w + b)
    return X[np.abs(margins - 1) < 0.1]