Random Forest
Ensemble of decision trees trained on bootstrapped data with feature subsampling.
The Mathematics
Bootstrap aggregation, feature subsampling, and variance reduction
Bias-Variance Decomposition
Averaging uncorrelated models each with variance reduces variance by . With correlation between trees:
Random feature subsampling decorrelates trees (reduces ), so the ensemble variance shrinks even further than plain bagging.
Bootstrap Sampling
Each tree trains on a bootstrap sample: points drawn with replacement from the training set. On average, each sample omits about 36.8% of the original points (OOB set):
The left-out OOB (out-of-bag) points serve as a free validation set.
Feature Subsampling
At each split, only a random subset of features is considered. For classification with features, the standard choice is:
This prevents any single dominant feature from appearing in every tree, creating diverse, decorrelated trees.
Majority Vote
For classification, the final prediction is the class chosen by the most trees:
For regression, trees are averaged: .
OOB Error
Each data point can be evaluated by the trees that did not train on it, giving an unbiased estimate of generalization error:
Free Feature Importance
Feature importance can be computed by measuring how much OOB error increases when a feature's values are randomly permuted — no separate validation set required.
See It Work
Watch trees bootstrap, split, and vote to form the ensemble
Data & Decision Boundaries
Ensemble Status
Initialize: 5 trees, 12 data points, feature subsampling m=⌊√2⌋=1.
The Code
Bridge from mathematical formulation to Python implementation
Mathematical Formulation
Bootstrap sample: draw n indices with replacement
Number of features to consider at each split
Fit a depth-limited decision tree on the bootstrap sample
Get prediction from tree t for all test points
Stack predictions from all trees
Majority vote: final prediction is the most common class
Python Implementation
def random_forest(X, y, n_trees=5, max_depth=1):
trees = []
for t in range(n_trees):
idx = np.random.choice(len(X), len(X), replace=True)
X_boot, y_boot = X[idx], y[idx]
max_features = int(np.sqrt(X.shape[1]))
feat_idx = np.random.choice(X.shape[1], max_features, replace=False)
tree = DecisionTree(max_depth=max_depth)
tree.fit(X_boot[:, feat_idx], y_boot)
trees.append((tree, feat_idx))
return trees
def predict(X, trees):
all_preds = []
for tree, feat_idx in trees:
preds = tree.predict(X[:, feat_idx])
all_preds.append(preds)
all_preds = np.array(all_preds)
from scipy.stats import mode
final = mode(all_preds, axis=0).mode[0]
return final