Naive Bayes
Probabilistic classifier using Bayes' theorem with the conditional independence assumption.
The Mathematics
Bayes' theorem, conditional independence, and Gaussian likelihood
Bayes' Theorem
Classification by posterior probability. We want the class that maximizes the probability given the observed features:
Since is constant across classes, we only need to maximize the numerator: .
Conditional Independence
The “naive” assumption: features are independent given the class label. This factorizes the joint likelihood:
This is rarely true in practice, yet Naive Bayes often performs surprisingly well — especially in high dimensions where estimating the full joint is infeasible.
Gaussian Likelihood
For continuous features we model each feature as Gaussian: :
Parameters and are estimated from the training points in class .
Log Posterior
To avoid numerical underflow with many features, work in log space. The prediction is:
Where It Shines
Despite the naive independence assumption, Naive Bayes excels at text classification (spam filters, sentiment analysis) where features are bag-of-words counts. It also trains in — a single pass through the data — and handles missing features gracefully.
See It Work
Watch Naive Bayes estimate class distributions and classify the query
Data & Gaussian Distributions
Log Posteriors
log P(C₀|x)
—
log P(C₁|x)
—
Query Point
x₁ = 3.5, x₂ = 4
Dataset: 12 points, 2 classes. We will classify the query point (3.5, 4.0).
The Code
Bridge from mathematical formulation to Python implementation
Mathematical Formulation
Estimate class prior from training data
Compute per-class mean for each feature
Compute per-class standard deviation for each feature
Log-likelihood under Gaussian assumption (naive independence)
Log of class prior probability
Log-likelihood of query under class k Gaussian
Predict class with highest log-posterior
Python Implementation
import numpy as np
def gaussian_nb(X_train, y_train, x_query):
classes = np.unique(y_train)
priors, means, stds = {}, {}, {}
for c in classes:
X_c = X_train[y_train == c]
priors[c] = len(X_c) / len(X_train)
means[c] = X_c.mean(axis=0)
stds[c] = X_c.std(axis=0) + 1e-9
def log_likelihood(x, mu, sigma):
return -0.5 * np.sum(
((x - mu) / sigma)**2 + np.log(2*np.pi*sigma**2)
)
log_posteriors = {}
for c in classes:
log_prior = np.log(priors[c])
log_like = log_likelihood(x_query, means[c], stds[c])
log_posteriors[c] = log_prior + log_like
prediction = max(log_posteriors, key=log_posteriors.get)
return prediction, log_posteriors