Modular addition without black-boxes: Compressing explanations of MLPs that compute numerical integration
Chun Hei Yip, Rajashree Agrawal, Lawrence Chan, Jason Gross

TL;DR
This paper introduces a novel method for compressing nonlinear feature-maps in MLPs by interpreting them as quadrature schemes, advancing mechanistic interpretability of neural networks.
Contribution
It presents the first rigorous compression of nonlinear feature-maps in MLPs using an analytical approach based on the infinite-width limit.
Findings
MLP layers can be interpreted as evaluating quadrature schemes.
ReLU MLP behavior can be approximated by integrals in the infinite-width limit.
The approach provides bounds on MLP behavior proportional to model size.
Abstract
The goal of mechanistic interpretability is discovering simpler, low-rank algorithms implemented by models. While we can compress activations into features, compressing nonlinear feature-maps -- like MLP layers -- is an open problem. In this work, we present the first case study in rigorously compressing nonlinear feature-maps, which are the leading asymptotic bottleneck to compressing small transformer models. We work in the classic setting of the modular addition models, and target a non-vacuous bound on the behaviour of the ReLU MLP in time linear in the parameter-count of the circuit. To study the ReLU MLP analytically, we use the infinite-width lens, which turns post-activation matrix multiplications into approximate integrals. We discover a novel interpretation of} the MLP layer in one-layer transformers implementing the ``pizza'' algorithm: the MLP can be understood as evaluating…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMathematics, Computing, and Information Processing
Methods*Communicated@Fast*How Do I Communicate to Expedia?
