Principal Component Analysis

Reduce dimensionality by projecting data onto the directions of maximum variance.

Dimensionality Reduction
Phase 1f(x)

The Mathematics

Variance maximization, covariance matrix, and eigendecomposition

Variance Maximization

PCA finds the direction v1v_1 that maximizes the variance of the projected data:

v1=argmaxv=1  vΣvv_1 = \arg\max_{\|v\|=1}\; v^\top \Sigma v

This is equivalent to finding the leading eigenvector of the covariance matrix Σ\Sigma.

Covariance Matrix

First center the data: x~i=xixˉ\tilde{x}_i = x_i - \bar{x}. The sample covariance matrix is:

Σ=1n1X~X~\Sigma = \frac{1}{n-1}\tilde{X}^\top\tilde{X}

Entry Σjk\Sigma_{jk} measures how features jj and kk co-vary. Diagonal entries are the per-feature variances.

Eigendecomposition

Decompose the symmetric covariance matrix into its eigenvectors (principal directions) and eigenvalues (explained variances):

Σ=VΛV\Sigma = V \Lambda V^\top

Columns of VV are orthonormal eigenvectors; Λ=diag(λ1,,λd)\Lambda = \text{diag}(\lambda_1,\ldots,\lambda_d) with λ1λ2\lambda_1 \geq \lambda_2 \geq \cdots.

Projection

Project the centered data onto the top kk principal components to get the lower-dimensional representation:

Z=X~VkRn×kZ = \tilde{X} V_k \in \mathbb{R}^{n \times k}

Choosing k — Scree Plot

Plot the eigenvalues in decreasing order; look for the "elbow" where the curve flattens. The fraction of variance explained by k components is j=1kλj  /  jλj\sum_{j=1}^k \lambda_j \;/\; \sum_j \lambda_j. A common rule of thumb is to retain enough components to explain 95% of the total variance.

Phase 2

See It Work

Watch PCA find the directions of maximum variance

Centered Data & Principal Components

x̃₁x̃₂-4.0-2.4-0.80.82.44.0-4.0-2.4-0.80.82.44.0
Data point (centered)
XR10×2X\in\mathbb{R}^{10\times2}
Step 1 of 8

Original dataset: 10 points with strong positive correlation.

Phase 3</>

The Code

Bridge from mathematical formulation to Python implementation

Mathematical Formulation

xˉ=1nixi\bar{x} = \frac{1}{n}\sum_i x_i

Compute mean of each feature

X~=Xxˉ\tilde{X} = X - \bar{x}

Center data by subtracting the mean

Σ=1n1X~X~\Sigma = \frac{1}{n-1}\tilde{X}^\top\tilde{X}

Compute sample covariance matrix

Σv=λv\Sigma v = \lambda v

Eigendecomposition of covariance matrix

λ1λ2\lambda_1 \geq \lambda_2 \geq \cdots

Sort eigenvalues in descending order

Vk=[v1,,vk]V_k = [v_1,\ldots,v_k]

Select top-k eigenvectors as principal components

Z=X~VkZ = \tilde{X}V_k

Project centered data onto principal components

Var explained=j=1kλjjλj\text{Var explained} = \frac{\sum_{j=1}^k \lambda_j}{\sum_j \lambda_j}

Compute fraction of variance explained by k components

Python Implementation

def pca(X, n_components=1):
    X_mean = X.mean(axis=0)
    X_c = X - X_mean
    n = X.shape[0]
    cov = (X_c.T @ X_c) / (n - 1)
    eigenvalues, eigenvectors = np.linalg.eigh(cov)
    idx = np.argsort(eigenvalues)[::-1]
    eigenvalues = eigenvalues[idx]
    eigenvectors = eigenvectors[:, idx]
    components = eigenvectors[:, :n_components]
    X_proj = X_c @ components
    total_var = eigenvalues.sum()
    explained = eigenvalues[:n_components] / total_var
    return X_proj, components, explained