How Many Features Can a Language Model Store Under the Linear Representation Hypothesis?
Nikhil Garg, Jon Kleinberg, Kenny Peng

TL;DR
This paper develops a mathematical framework to analyze how many features a language model can store and access linearly, providing bounds that support the superposition hypothesis in neural representations.
Contribution
It establishes nearly-matching bounds for linear compressed sensing in neural features, highlighting the strength of linear accessibility over mere linear representation.
Findings
Neurons can store exponentially many features under the LRH.
Linear accessibility is a stronger condition than linear representation.
Theoretical bounds differ significantly from classical compressed sensing results.
Abstract
We introduce a mathematical framework for the linear representation hypothesis (LRH), which asserts that intermediate layers of language models store features linearly. We separate the hypothesis into two claims: linear representation (features are linearly embedded in neuron activations) and linear accessibility (features can be linearly decoded). We then ask: How many neurons suffice to both linearly represent and linearly access features? Classical results in compressed sensing imply that for -sparse inputs, suffices if we allow non-linear decoding algorithms (Candes and Tao, 2006; Candes et al., 2006; Donoho, 2006). However, the additional requirement of linear decoding takes the problem out of the classical compressed sensing, into linear compressed sensing. Our main theoretical result establishes nearly-matching upper and lower bounds for linear…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Stochastic Gradient Optimization Techniques · Ferroelectric and Negative Capacitance Devices
