The Anxiety of Influence: Bloom Filters in Transformer Attention Heads
Peter Balogh

TL;DR
This paper identifies and analyzes specific attention heads in transformer models that function as Bloom filter-like membership testers, revealing their properties, capacities, and roles in token processing.
Contribution
It uncovers and characterizes genuine membership-testing heads in transformers, demonstrating their multi-resolution, generalization, and coexistence with other computational functions.
Findings
Three genuine membership-testing heads exhibit Bloom filter-like behavior.
These heads are concentrated in early layers and respond to repeated tokens broadly.
Membership heads contribute to processing both repeated and novel tokens.
Abstract
Some transformer attention heads appear to function as membership testers, dedicating themselves to answering the question "has this token appeared before in the context?" We identify these heads across four language models (GPT-2 small, medium, and large; Pythia-160M) and show that they form a spectrum of membership-testing strategies. Two heads (L0H1 and L0H5 in GPT-2 small) function as high-precision membership filters with false positive rates of 0-4\% even at 180 unique context tokens -- well above the bit capacity of a classical Bloom filter. A third head (L1H11) shows the classic Bloom filter capacity curve: its false positive rate follows the theoretical formula with and fitted capacity bits, saturating by unique tokens. A fourth head initially identified as a Bloom filter (L3H0) was…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFerroelectric and Negative Capacitance Devices · Neurobiology of Language and Bilingualism · Face Recognition and Perception
