Hash Table

Map keys to values using a hash function for constant-time average lookups and inserts.

O(1) avgData Structures
Phase 1f(x)

The Mathematics

Hash functions, collision resolution, and expected constant-time access

Hash Function

A hash function hh maps keys from a universe UU to bucket indices in [0,m)[0, m):

h:U{0,1,,m1}h: U \to \{0, 1, \ldots, m-1\}

A simple example: h(k)=kmodmh(k) = k \bmod m. The goal is to distribute keys uniformly across buckets.

Collisions

When h(k1)=h(k2)h(k_1) = h(k_2) for k1k2k_1 \neq k_2, a collision occurs. Since Um|U| \gg m, collisions are inevitable (by the pigeonhole principle).

U>mk1k2:h(k1)=h(k2)|U| > m \Rightarrow \exists\, k_1 \neq k_2 : h(k_1) = h(k_2)

Chaining

Each bucket stores a linked list (chain) of all elements that hash to that index. Insert appends to the list; search scans the list:

\bullet Insert: O(1)O(1) — append to chain at h(k)h(k)

\bullet Search: O(1+α)O(1 + \alpha) expected — scan chain of expected length α\alpha

Load Factor & Expected Complexity

The load factor α=n/m\alpha = n/m is the average number of elements per bucket:

α=nm\alpha = \frac{n}{m}

Under simple uniform hashing, the expected time for search is O(1+α)O(1 + \alpha). If we keep α=O(1)\alpha = O(1) by resizing when α\alpha exceeds a threshold, all operations are O(1)O(1) expected (amortized for resizing).

Phase 2

See It Work

Inserting keys with h(k) = k mod 7 — watch for collisions and chaining

[0][1][2][3][4][5][6]
h(k)=kmodm,  m=7h(k) = k \bmod m,\; m = 7
Step 1 of 8

Empty hash table with 7 buckets. We use h(k) = k mod 7 with chaining for collisions.

Phase 3</>

The Code

Mapping hash table operations to a Python class with chaining

Mathematical Formulation

m=sizem = \text{size}

Create m empty buckets (linked lists for chaining)

h(k)=kmodmh(k) = k \bmod m

Hash function maps key to a bucket index in [0, m)

bucket[h(k)].append(k)\text{bucket}[h(k)].\text{append}(k)

Insert by appending to the chain at the hashed bucket

kbucket[h(k)]k \in \text{bucket}[h(k)]

Search by scanning the chain at the hashed bucket

α=n/m\alpha = n/m

Load factor: ratio of elements to buckets. Expected O(1+α) search

Python Implementation

class HashTable:
    def __init__(self, size=7):
        self.size = size
        self.buckets = [[] for _ in range(size)]

    def _hash(self, key):
        return key % self.size       # Hash function

    def insert(self, key):
        idx = self._hash(key)        # Compute bucket
        self.buckets[idx].append(key) # Chain append

    def search(self, key):
        idx = self._hash(key)        # Compute bucket
        return key in self.buckets[idx]  # Linear scan

    def load_factor(self):
        n = sum(len(b) for b in self.buckets)
        return n / self.size         # α = n/m