dict — Internal Structure and RehashingSince Python 3.6, dict is implemented as a compact hash table. The storage is divided into two parts:
┌─────────────────────────────────────┐
│ indices[] — "sparse" index array │ ← hash table (sparse)
├─────────────────────────────────────┤
│ entries[] — "dense" entry array │ ← dense, in insertion order
└─────────────────────────────────────┘
indices[] — Sparse Index Arrayindices = [EMPTY, 2, EMPTY, 0, 1, EMPTY, EMPTY, 3]
-1 ↑ ↑ ↑
│ │ └── index in entries[]
└─────────┘
int8 / int16 / int32 / int64-1 (EMPTY), -2 (DUMMY/deleted), or an index in entries[]entries[] — Dense Array (insertion-ordered!)Each element is a structure consisting of three fields:
// Objects/dictobject.c
typedef struct {
Py_hash_t me_hash; // cached hash of the key (Py_ssize_t)
PyObject *me_key; // pointer to the key
PyObject *me_value; // pointer to the value
} PyDictKeyEntry;
d["foo"]
1. hash_val = hash("foo")
2. slot = hash_val & (len(indices) - 1) # bitmask (faster than mod)
3. idx = indices[slot]
4. if idx == EMPTY → KeyError
5. if entries[idx].me_hash == hash_val
AND entries[idx].me_key is key # check by identity (fast)
OR entries[idx].me_key == key # fallback to equality (slow)
→ found
6. otherwise → probe (iterate through next slots)
// Simplified from dictobject.c
perturb = hash_val;
while (True):
slot = (slot * 5 + perturb + 1) & mask
perturb >>= PERTURB_SHIFT // PERTURB_SHIFT = 5
// check indices[slot]
This is not linear probing and not double hashing — the formula guarantees visiting all slots in the table for any power-of-2 size.
# Threshold: load factor > 2/3
# Specifically: used > (size * 2) // 3
# where used = number of occupied + deleted (dummy) slots
size=8 → resize when used > 5
size=16 → resize when used > 10
size=32 → resize when used > 21
// minused = number of actual live keys
// Find the smallest power of two > minused * 4 (if dict is small)
// or > minused * 2 (if large, > 50,000)
new_size = 8
while new_size <= minused * growth_factor:
new_size <<= 1
Growth factor for small dicts: × 4
Growth factor for large dicts: × 2
OLD TABLE NEW TABLE
indices[8] indices[32] (new empty array)
entries[5 entries] entries[] (same objects!)
Steps:
1. Allocate new indices[] of the required size, fill with EMPTY
2. Iterate through entries[] (dense array — NO gaps!)
3. For each entry: new slot = me_hash & new_mask
4. Insert index into the new indices[] (with probing on collisions)
5. entries[] is NOT copied — only indices[] is recalculated!
Key advantage of the compact format: resizing doesn't require moving the data itself — only reindexing
indices[]. Insertion order is preserved automatically.
d = {i: i for i in range(100)}
del d[50] # what happens?
indices[slot] = DUMMY (-2) ← "tombstone", not EMPTY!
entries[idx].me_key = NULL
entries[idx].me_value = NULL
Dummy is needed because otherwise probing chains would be broken:
Insert: A→slot5, B→slot5(collision)→slot7
Delete A: slot5=EMPTY
Search B: slot5==EMPTY → KeyError! ← INCORRECT without dummy
Resize down happens only during the next insertion if:
(used - dummy) * 4 <= size
class Point:
def __init__(self, x, y):
self.x = x
self.y = y
p1 = Point(1, 2)
p2 = Point(3, 4)
Instances of the same class share ONE indices[] + keys[]
and have their own values[]
p1.__dict__ → keys=[shared], values=[1, 2]
p2.__dict__ → keys=[shared], values=[3, 4]
If all keys are str, CPython uses an optimized lookup (lookdict_unicode) that skips type checks.
| Aspect | Details |
|---|---|
| Structure | indices[] (sparse) + entries[] (dense) |
| Table Size | Power of 2, minimum 8 |
| Resize Threshold | used > size * 2/3 |
| New Size | × 4 (small) / × 2 (> 50k entries) |
| Probing | Perturbation (slot = slot*5 + perturb + 1) |
| Deletion | DUMMY slots, no immediate shrink |
| Order Guarantee | Insertion order (official since 3.7) |
| Lookup Complexity | O(1) amortized |