Principal Component Analysis

Phase 1f(x)

The Mathematics

Variance maximization, covariance matrix, and eigendecomposition

Variance Maximization

PCA finds the direction $v_1$ that maximizes the variance of the projected data:

v_1 = \arg\max_{\|v\|=1}\; v^\top \Sigma v

This is equivalent to finding the leading eigenvector of the covariance matrix $\Sigma$ .

Covariance Matrix

First center the data: $\tilde{x}_i = x_i - \bar{x}$ . The sample covariance matrix is:

\Sigma = \frac{1}{n-1}\tilde{X}^\top\tilde{X}

Entry $\Sigma_{jk}$ measures how features $j$ and $k$ co-vary. Diagonal entries are the per-feature variances.

Eigendecomposition

Decompose the symmetric covariance matrix into its eigenvectors (principal directions) and eigenvalues (explained variances):

\Sigma = V \Lambda V^\top

Columns of $V$ are orthonormal eigenvectors; $\Lambda = \text{diag}(\lambda_1,\ldots,\lambda_d)$ with $\lambda_1 \geq \lambda_2 \geq \cdots$ .

Projection

Project the centered data onto the top $k$ principal components to get the lower-dimensional representation:

Z = \tilde{X} V_k \in \mathbb{R}^{n \times k}

Choosing k — Scree Plot

Plot the eigenvalues in decreasing order; look for the "elbow" where the curve flattens. The fraction of variance explained by k components is $\sum_{j=1}^k \lambda_j \;/\; \sum_j \lambda_j$ . A common rule of thumb is to retain enough components to explain 95% of the total variance.

Phase 2▶

See It Work

Watch PCA find the directions of maximum variance

Centered Data & Principal Components

Data point (centered)

X\in\mathbb{R}^{10\times2}

Step 1 of 8

Original dataset: 10 points with strong positive correlation.

Phase 3</>

The Code

Bridge from mathematical formulation to Python implementation

Mathematical Formulation

\bar{x} = \frac{1}{n}\sum_i x_i

Compute mean of each feature

\tilde{X} = X - \bar{x}

Center data by subtracting the mean

\Sigma = \frac{1}{n-1}\tilde{X}^\top\tilde{X}

Compute sample covariance matrix

\Sigma v = \lambda v

Eigendecomposition of covariance matrix

\lambda_1 \geq \lambda_2 \geq \cdots

Sort eigenvalues in descending order

V_k = [v_1,\ldots,v_k]

Select top-k eigenvectors as principal components

Z = \tilde{X}V_k

Project centered data onto principal components

\text{Var explained} = \frac{\sum_{j=1}^k \lambda_j}{\sum_j \lambda_j}

Compute fraction of variance explained by k components

Python Implementation

def pca(X, n_components=1):
    X_mean = X.mean(axis=0)
    X_c = X - X_mean
    n = X.shape[0]
    cov = (X_c.T @ X_c) / (n - 1)
    eigenvalues, eigenvectors = np.linalg.eigh(cov)
    idx = np.argsort(eigenvalues)[::-1]
    eigenvalues = eigenvalues[idx]
    eigenvectors = eigenvectors[:, idx]
    components = eigenvectors[:, :n_components]
    X_proj = X_c @ components
    total_var = eigenvalues.sum()
    explained = eigenvalues[:n_components] / total_var
    return X_proj, components, explained