A Compression Perspective on Simplicity Bias
Tom Marty, Eric Elmoznino, Leo Gagnon, Tejas Kasetty, Mizu Nishikawa-Toomey, Sarthak Mittal, Guillaume Lajoie, Dhanya Sridhar

TL;DR
This paper offers a theoretical framework based on the Minimum Description Length principle to explain how neural networks favor simple functions, and how data quantity influences feature complexity and robustness.
Contribution
It formalizes simplicity bias as an optimal compression trade-off, predicting feature selection dynamics across different data regimes in neural networks.
Findings
Neural network feature selection follows optimal compression trajectories.
Increasing data leads to a shift from simple to complex features.
Limiting data acts as a regularizer against unreliable complex cues.
Abstract
Deep neural networks exhibit a simplicity bias, a well-documented tendency to favor simple functions over complex ones. In this work, we cast new light on this phenomenon through the lens of the Minimum Description Length principle, formalizing supervised learning as a problem of optimal two-part lossless compression. Our theory explains how simplicity bias governs feature selection in neural networks through a fundamental trade-off between model complexity (the cost of describing the hypothesis) and predictive power (the cost of describing the data). Our framework predicts that as the amount of available training data increases, learners transition through qualitatively different features -- from simple spurious shortcuts to complex features -- only when the reduction in data encoding cost justifies the increased model complexity. Consequently, we identify distinct data regimes where…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
