Mathematical Models of Computation in Superposition
Kaarel H\"anni, Jake Mendel, Dmitry Vaintrob, Lawrence Chan

TL;DR
This paper introduces mathematical models demonstrating how superposition can be actively used for efficient computation in neural networks, challenging the view that superposition only hinders interpretability.
Contribution
The work constructs neural network models that leverage superposition for efficient circuit emulation, extending to deep networks and providing insights for interpretability.
Findings
Single-layer MLP emulates pairwise AND with superposition using O(m^{2/3}) neurons
Deep networks with error correction layers emulate low-depth circuits of width O(d^{1.5})
Potential applications for interpreting superposition-based neural computation
Abstract
Superposition -- when a neural network represents more ``features'' than it has dimensions -- seems to pose a serious challenge to mechanistically interpreting current AI systems. Existing theory work studies \emph{representational} superposition, where superposition is only used when passing information through bottlenecks. In this work, we present mathematical models of \emph{computation} in superposition, where superposition is actively helpful for efficiently accomplishing the task. We first construct a task of efficiently emulating a circuit that takes the AND of the pairs of each of features. We construct a 1-layer MLP that uses superposition to perform this task up to -error, where the network only requires neurons, even when the input features are \emph{themselves in superposition}. We generalize this construction to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputability, Logic, AI Algorithms
