Linear Probing Revisited: Tombstones Mark the Death of Primary Clustering
Michael A. Bender, Bradley C. Kuszmaul, William Kuszmaul

TL;DR
This paper revisits linear probing hash tables, showing that deletion tombstones can prevent primary clustering and introducing a new variant called graveyard hashing that guarantees optimal performance regardless of clustering effects.
Contribution
It demonstrates how deletion strategies influence clustering behavior and introduces graveyard hashing, a new method that eliminates primary clustering in linear probing hash tables.
Findings
Tombstones created by deletions can counteract primary clustering effects.
Graveyard hashing guarantees expected amortized insertion cost of O(x) at high load factors.
In external memory models, graveyard hashing achieves near-optimal block transfer performance.
Abstract
First introduced in 1954, linear probing is one of the oldest data structures in computer science, and due to its unrivaled data locality, it continues to be one of the fastest hash tables in practice. It is widely believed and taught, however, that linear probing should never be used at high load factors; this is because primary-clustering effects cause insertions at load factor to take expected time (rather than the ideal ). The dangers of primary clustering, first discovered by Knuth in 1963, have been taught to generations of computer scientists, and have influenced the design of some of many widely used hash tables. We show that primary clustering is not a foregone conclusion. We demonstrate that small design decisions in how deletions are implemented have dramatic effects on the asymptotic performance of insertions, so that, even if a hash…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
