Support Vector Machine
Find the maximum-margin hyperplane that separates two classes, with support vectors defining the boundary.
The Mathematics
Maximum margin hyperplane, support vectors, and the primal objective
Decision Function
SVM classifies a point by the sign of its distance from the decision hyperplane :
Points with are classified as class +1, and points with a negative value as class −1.
Maximum Margin
The geometric margin — the perpendicular distance between the two parallel margin hyperplanes — is:
SVM finds the maximum-margin hyperplane: the one that maximizes , pushing the two class clouds as far apart as possible.
Primal Objective
Maximizing the margin is equivalent to minimizing subject to all points being correctly classified outside the margin:
The soft-margin variant adds slack variables and a penalty to tolerate some misclassifications.
Support Vectors
The solution depends only on the points lying on the margin boundaries (support vectors). The weight vector is their weighted sum:
Kernel Trick
Replace with a kernel to implicitly operate in a high-dimensional feature space — enabling nonlinear decision boundaries without ever computing explicitly.
See It Work
Watch the maximum-margin hyperplane converge
Data & Decision Boundary
Current Parameters
Phase
init
Initialize: w=(0,0), b=0. No decision boundary yet.
The Code
Bridge from mathematical formulation to Python implementation
Mathematical Formulation
Initialize weight vector and bias to zero
Check if point is within or violates the margin
Update weights for misclassified / margin-violation point
Regularization step for correctly classified points
Predict class label using sign of decision function
Compute margin width from weight vector norm
Identify support vectors near the margin boundary
Python Implementation
def svm_train(X, y, C=1.0, lr=0.01, epochs=1000):
n, d = X.shape
w = np.zeros(d)
b = 0.0
for epoch in range(epochs):
for i in range(n):
margin = y[i] * (X[i] @ w + b)
if margin < 1:
w += lr * (C * y[i] * X[i] - w)
b += lr * C * y[i]
else:
w -= lr * w
return w, b
def predict(X, w, b):
return np.sign(X @ w + b)
def margin_width(w):
return 2 / np.linalg.norm(w)
def support_vectors(X, y, w, b):
margins = y * (X @ w + b)
return X[np.abs(margins - 1) < 0.1]