Geometric Limits of Knowledge Distillation: A Minimum-Width Theorem via Superposition Theory
Nilesh Sarkar, Dawar Jyoti Deka

TL;DR
This paper reveals that the performance limit in knowledge distillation is fundamentally geometric, determined by the superposition capacity of neural networks, and provides a way to predict this limit using autoencoder measurements.
Contribution
It introduces a geometric minimum-width theorem for knowledge distillation based on superposition theory, linking feature capacity to network width and validating it empirically.
Findings
Performance saturates at a geometric loss floor related to feature superposition.
The loss floor can be predicted from autoencoder-measured feature capacity.
Coarse concepts survive even with significant feature loss, indicating the floor arises from fine-grained feature loss.
Abstract
Knowledge distillation compresses large teachers into smaller students, but performance saturates at a loss floor that persists across training methods and objectives. We argue this floor is geometric: neural networks represent far more features than dimensions through superposition, and a student of width can encode at most features, where is a sparsity-dependent capacity function. Features beyond this budget are permanently lost, yielding an importance-weighted loss floor. We validate on a toy model (48 configurations, median accuracy >93%) and on Pythia-410M, where sparse autoencoders measure features at (critical width ). Distillation into five student widths confirms the predicted monotonic floor ordering. The observed floor decomposes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
