Logistic Regression

Binary classification using the sigmoid function and cross-entropy loss.

Classification

Phase 1f(x)

The Mathematics

Sigmoid function, cross-entropy loss, and gradient descent

The Sigmoid Function

Logistic regression maps a linear combination $z = w^T x + b$ through the sigmoid function to produce a probability:

\sigma(z) = \frac{1}{1 + e^{-z}} \in (0, 1)

This gives $P(y=1|x) = \sigma(w^Tx + b)$ . We classify as 1 when $\sigma(z) \geq 0.5$ , equivalently when $z \geq 0$ .

Binary Cross-Entropy Loss

The loss function penalizes confident wrong predictions heavily:

\mathcal{L} = -\frac{1}{n}\sum_{i=1}^{n}\left[y_i \log(\sigma(z_i)) + (1 - y_i)\log(1 - \sigma(z_i))\right]

When $y_i = 1$ , only the $\log \sigma(z_i)$ term remains, pushing $\sigma(z_i)$ toward 1. Vice versa for $y_i = 0$ .

Gradient Descent

The gradient has an elegant form identical in structure to linear regression:

\nabla_w \mathcal{L} = \frac{1}{n}\sum_{i=1}^{n}(\sigma(z_i) - y_i) \cdot x_i

\frac{\partial \mathcal{L}}{\partial b} = \frac{1}{n}\sum_{i=1}^{n}(\sigma(z_i) - y_i)

Connection to Linear Regression

The gradient $\frac{1}{n}X^T(\hat{y} - y)$ has the same form for both linear and logistic regression! The only difference is that $\hat{y} = \sigma(Xw + b)$ passes through the sigmoid, while in linear regression $\hat{y} = Xw + b$ .

Decision Boundary

The decision boundary is where $\sigma(z) = 0.5$ , i.e. $z = 0$ :

w_1 x_1 + w_2 x_2 + b = 0 \quad\Rightarrow\quad x_2 = -\frac{w_1}{w_2}x_1 - \frac{b}{w_2}

This is a straight line in 2D, or a hyperplane in higher dimensions.

Phase 2▶

See It Work

Watch the decision boundary separate two classes

Data & Decision Boundary

BCE Loss Over Iterations

w₁ = 0.000, w₂ = 0.000

b = 0.000

Loss = 0.693

Class 0Class 1Decision boundary

\sigma(z) = \frac{1}{1+e^{-z}},\quad w_1=w_2=b=0

Step 1 of 13

Initialize: w₁=0, w₂=0, b=0. BCE loss = 0.693.

Phase 3</>

The Code

Bridge from mathematical formulation to Python implementation

Mathematical Formulation

w = \mathbf{0},\; b = 0

Initialize weight vector and bias to zero

z_i = w^T x_i + b

Compute the linear combination (logit)

\sigma(z) = \frac{1}{1 + e^{-z}}

Apply sigmoid to get probability

\mathcal{L} = -\frac{1}{n}\sum[y_i\log p_i + (1-y_i)\log(1-p_i)]

Binary cross-entropy loss

\nabla_w \mathcal{L} = \frac{1}{n} X^T(p - y)

Gradient with respect to weights

w \leftarrow w - \alpha \nabla_w \mathcal{L}

Update weights via gradient descent

Python Implementation

def logistic_regression(X, y, lr=0.1, epochs=100):
    w = np.zeros(X.shape[1])             # Initialize weights
    b = 0.0                              # Initialize bias
    n = len(X)
    for epoch in range(epochs):           # Training loop
        z = X @ w + b                    # Linear combination
        p = 1 / (1 + np.exp(-z))         # Sigmoid activation
        loss = -(1/n) * sum(y*log(p) + (1-y)*log(1-p))  # BCE
        dw = (1/n) * X.T @ (p - y)       # Gradient w.r.t. w
        db = (1/n) * sum(p - y)           # Gradient w.r.t. b
        w -= lr * dw                     # Update weights
        b -= lr * db                     # Update bias
    return w, b