K-Means Clustering

Partition data into k clusters by iteratively assigning points and updating centroids.

Clustering
Phase 1f(x)

The Mathematics

WCSS objective, assignment step, update step, and convergence

Objective: Within-Cluster Sum of Squares

K-Means seeks to partition nn data points into kk clusters C1,,CkC_1, \ldots, C_k by minimizing the total distance from each point to its cluster centroid:

WCSS=j=1kxiCjxiμj2\text{WCSS} = \sum_{j=1}^{k}\sum_{x_i \in C_j} \|x_i - \mu_j\|^2

where μj\mu_j is the centroid (mean) of cluster CjC_j.

Lloyd's Algorithm: Two Alternating Steps

1. Assignment Step

Assign each point to the nearest centroid:

ci=argminjxiμj2c_i = \arg\min_j \|x_i - \mu_j\|^2

2. Update Step

Recompute each centroid as the mean of its assigned points:

μj=1CjxiCjxi\mu_j = \frac{1}{|C_j|}\sum_{x_i \in C_j} x_i

Convergence

Each step decreases (or maintains) WCSS, so the algorithm is guaranteed to converge. However, it may converge to a local minimum:

WCSS(t+1)WCSS(t)\text{WCSS}^{(t+1)} \leq \text{WCSS}^{(t)}

K-Means++ Initialization

Random initialization can lead to poor local minima. K-Means++ selects initial centroids spread apart, giving O(logk)O(\log k) competitive ratio in expectation.

Phase 2

See It Work

Watch K-Means alternate between assign and update steps

Clusters & Centroids

x₁x₂0.01.83.65.47.29.00.01.42.84.25.67.0μ1μ2μ3

WCSS Over Iterations

IterationWCSS0.0024.5349.0673.5998.12

WCSS =

Phase: init

Cluster 1Cluster 2Cluster 3Centroid
k=3,  μ1,μ2,μ3 initializedk = 3,\; \mu_1, \mu_2, \mu_3 \text{ initialized}
Step 1 of 9

Initialize 3 centroids at arbitrary positions. Now alternate between assign and update steps.

Phase 3</>

The Code

Bridge from mathematical formulation to Python implementation

Mathematical Formulation

μ1,,μkinit\mu_1, \ldots, \mu_k \leftarrow \text{init}

Initialize k centroids (random or heuristic)

xiμj2  j\|x_i - \mu_j\|^2 \;\forall j

Compute distance from each point to each centroid

ci=argminjxiμj2c_i = \arg\min_j \|x_i - \mu_j\|^2

Assign each point to its nearest centroid

μj=1CjxiCjxi\mu_j = \frac{1}{|C_j|}\sum_{x_i \in C_j} x_i

Update centroid to mean of assigned points

μ(t+1)=μ(t)stop\mu^{(t+1)} = \mu^{(t)} \Rightarrow \text{stop}

Stop when centroids no longer change

Python Implementation

def k_means(X, k=3, max_iter=100):
    centroids = initialize_centroids(X, k)  # Random init
    for iteration in range(max_iter):
        # Assign step
        assignments = []
        for x in X:                           # For each point
            dists = [norm(x - c) for c in centroids]
            assignments.append(argmin(dists))  # Nearest centroid
        # Update step
        for j in range(k):                    # For each cluster
            cluster = X[assignments == j]
            centroids[j] = mean(cluster)      # New centroid
        if converged(centroids):
            break                             # Done
    return assignments, centroids