Logistic Regression
Binary classification using the sigmoid function and cross-entropy loss.
The Mathematics
Sigmoid function, cross-entropy loss, and gradient descent
The Sigmoid Function
Logistic regression maps a linear combination through the sigmoid function to produce a probability:
This gives . We classify as 1 when , equivalently when .
Binary Cross-Entropy Loss
The loss function penalizes confident wrong predictions heavily:
When , only the term remains, pushing toward 1. Vice versa for .
Gradient Descent
The gradient has an elegant form identical in structure to linear regression:
Connection to Linear Regression
The gradient has the same form for both linear and logistic regression! The only difference is that passes through the sigmoid, while in linear regression .
Decision Boundary
The decision boundary is where , i.e. :
This is a straight line in 2D, or a hyperplane in higher dimensions.
See It Work
Watch the decision boundary separate two classes
Data & Decision Boundary
BCE Loss Over Iterations
w₁ = 0.000, w₂ = 0.000
b = 0.000
Loss = 0.693
Initialize: w₁=0, w₂=0, b=0. BCE loss = 0.693.
The Code
Bridge from mathematical formulation to Python implementation
Mathematical Formulation
Initialize weight vector and bias to zero
Compute the linear combination (logit)
Apply sigmoid to get probability
Binary cross-entropy loss
Gradient with respect to weights
Update weights via gradient descent
Python Implementation
def logistic_regression(X, y, lr=0.1, epochs=100):
w = np.zeros(X.shape[1]) # Initialize weights
b = 0.0 # Initialize bias
n = len(X)
for epoch in range(epochs): # Training loop
z = X @ w + b # Linear combination
p = 1 / (1 + np.exp(-z)) # Sigmoid activation
loss = -(1/n) * sum(y*log(p) + (1-y)*log(1-p)) # BCE
dw = (1/n) * X.T @ (p - y) # Gradient w.r.t. w
db = (1/n) * sum(p - y) # Gradient w.r.t. b
w -= lr * dw # Update weights
b -= lr * db # Update bias
return w, b