Linear Regression
Fit a line to data by minimizing mean squared error using gradient descent.
The Mathematics
Linear model, MSE loss, and gradient descent optimization
The Model
Linear regression fits a line to data points by finding the weight and bias that best explain the relationship.
Loss Function: Mean Squared Error
We measure how wrong our predictions are using the average squared difference between predicted and actual :
This is a convex function in and , so gradient descent is guaranteed to find the global minimum.
Gradient Descent
We compute the partial derivatives and update parameters in the direction of steepest descent:
Update rule with learning rate :
Convergence
Since MSE is convex, gradient descent converges to the global minimum. The optimal solution can also be found analytically via the normal equation:
Why Gradient Descent?
While the normal equation works for simple linear regression, gradient descent scales to millions of features and is the foundation for training neural networks.
See It Work
Watch gradient descent fit a line to data
Data & Regression Line
Loss Over Iterations
w = 0.000, b = 0.000
Loss = 121.32
Initialize: w=0, b=0. MSE loss = 121.32.
The Code
Bridge from mathematical formulation to Python implementation
Mathematical Formulation
Initialize weight and bias to zero
Forward pass: compute predictions for all points
Compute mean squared error loss
Gradient of loss with respect to weight
Gradient of loss with respect to bias
Update weight using gradient descent
Update bias using gradient descent
Python Implementation
def linear_regression(X, y, lr=0.01, epochs=100):
w, b = 0.0, 0.0 # Initialize parameters
n = len(X)
for epoch in range(epochs): # Training loop
y_hat = w * X + b # Forward pass
loss = (1/n) * sum((y_hat - y)**2) # MSE loss
dw = (2/n) * sum((y_hat - y) * X) # Gradient w.r.t. w
db = (2/n) * sum(y_hat - y) # Gradient w.r.t. b
w -= lr * dw # Update w
b -= lr * db # Update b
return w, b