Logistic Regression

Binary classification using the sigmoid function and cross-entropy loss.

Classification
Phase 1f(x)

The Mathematics

Sigmoid function, cross-entropy loss, and gradient descent

The Sigmoid Function

Logistic regression maps a linear combination z=wTx+bz = w^T x + b through the sigmoid function to produce a probability:

σ(z)=11+ez(0,1)\sigma(z) = \frac{1}{1 + e^{-z}} \in (0, 1)

This gives P(y=1x)=σ(wTx+b)P(y=1|x) = \sigma(w^Tx + b). We classify as 1 when σ(z)0.5\sigma(z) \geq 0.5, equivalently when z0z \geq 0.

Binary Cross-Entropy Loss

The loss function penalizes confident wrong predictions heavily:

L=1ni=1n[yilog(σ(zi))+(1yi)log(1σ(zi))]\mathcal{L} = -\frac{1}{n}\sum_{i=1}^{n}\left[y_i \log(\sigma(z_i)) + (1 - y_i)\log(1 - \sigma(z_i))\right]

When yi=1y_i = 1, only the logσ(zi)\log \sigma(z_i) term remains, pushing σ(zi)\sigma(z_i) toward 1. Vice versa for yi=0y_i = 0.

Gradient Descent

The gradient has an elegant form identical in structure to linear regression:

wL=1ni=1n(σ(zi)yi)xi\nabla_w \mathcal{L} = \frac{1}{n}\sum_{i=1}^{n}(\sigma(z_i) - y_i) \cdot x_iLb=1ni=1n(σ(zi)yi)\frac{\partial \mathcal{L}}{\partial b} = \frac{1}{n}\sum_{i=1}^{n}(\sigma(z_i) - y_i)

Connection to Linear Regression

The gradient 1nXT(y^y)\frac{1}{n}X^T(\hat{y} - y) has the same form for both linear and logistic regression! The only difference is that y^=σ(Xw+b)\hat{y} = \sigma(Xw + b) passes through the sigmoid, while in linear regression y^=Xw+b\hat{y} = Xw + b.

Decision Boundary

The decision boundary is where σ(z)=0.5\sigma(z) = 0.5, i.e. z=0z = 0:

w1x1+w2x2+b=0x2=w1w2x1bw2w_1 x_1 + w_2 x_2 + b = 0 \quad\Rightarrow\quad x_2 = -\frac{w_1}{w_2}x_1 - \frac{b}{w_2}

This is a straight line in 2D, or a hyperplane in higher dimensions.

Phase 2

See It Work

Watch the decision boundary separate two classes

Data & Decision Boundary

x₁x₂0.01.42.84.25.67.0-0.50.92.33.75.16.5

BCE Loss Over Iterations

IterationBCE Loss0.000.300.600.901.20

w₁ = 0.000, w₂ = 0.000

b = 0.000

Loss = 0.693

Class 0Class 1Decision boundary
σ(z)=11+ez,w1=w2=b=0\sigma(z) = \frac{1}{1+e^{-z}},\quad w_1=w_2=b=0
Step 1 of 13

Initialize: w₁=0, w₂=0, b=0. BCE loss = 0.693.

Phase 3</>

The Code

Bridge from mathematical formulation to Python implementation

Mathematical Formulation

w=0,  b=0w = \mathbf{0},\; b = 0

Initialize weight vector and bias to zero

zi=wTxi+bz_i = w^T x_i + b

Compute the linear combination (logit)

σ(z)=11+ez\sigma(z) = \frac{1}{1 + e^{-z}}

Apply sigmoid to get probability

L=1n[yilogpi+(1yi)log(1pi)]\mathcal{L} = -\frac{1}{n}\sum[y_i\log p_i + (1-y_i)\log(1-p_i)]

Binary cross-entropy loss

wL=1nXT(py)\nabla_w \mathcal{L} = \frac{1}{n} X^T(p - y)

Gradient with respect to weights

wwαwLw \leftarrow w - \alpha \nabla_w \mathcal{L}

Update weights via gradient descent

Python Implementation

def logistic_regression(X, y, lr=0.1, epochs=100):
    w = np.zeros(X.shape[1])             # Initialize weights
    b = 0.0                              # Initialize bias
    n = len(X)
    for epoch in range(epochs):           # Training loop
        z = X @ w + b                    # Linear combination
        p = 1 / (1 + np.exp(-z))         # Sigmoid activation
        loss = -(1/n) * sum(y*log(p) + (1-y)*log(1-p))  # BCE
        dw = (1/n) * X.T @ (p - y)       # Gradient w.r.t. w
        db = (1/n) * sum(p - y)           # Gradient w.r.t. b
        w -= lr * dw                     # Update weights
        b -= lr * db                     # Update bias
    return w, b