Why and When Deep is Better than Shallow: Implementation-Agnostic State-Transition Model of Deep Learning
Sho Sonoda, Yuka Hashimoto, Isao Ishikawa, Masahiro Ikeda

TL;DR
This paper analyzes the conditions under which increased depth in neural networks improves generalization, using an implementation-agnostic state-transition model to identify mechanisms that favor deep learning.
Contribution
It introduces a theoretical framework that separates errors and provides bounds, clarifying when depth offers statistical advantages in neural networks.
Findings
Depth improves generalization when approximation improves rapidly and the transition semigroup is geometrically tame.
Identifies mechanisms that keep entropy contributions saturated or polynomial, affecting depth benefits.
Contrasts mechanisms that saturate entropy with those that recover classical exponential-growth obstruction.
Abstract
Why and when does depth improve generalization? We study this question in an implementation-agnostic state-transition model, where a depth- predictor is a readout class composed with the word ball generated by hidden state transitions. Generalization bounds separate implementation error, approximation error, and statistical complexity, and upper bound the depth-dependent variance term by a Dudley entropy integral over , with a conditional lower-bound diagnostic under readout separation. We identify geometric and semigroup mechanisms that keep this entropy contribution saturated or polynomial, and contrast them with separation mechanisms that recover the classical exponential-growth obstruction. Coupling these variance upper bounds with approximation rates gives typical depth trade-off patterns, clarifying that depth is statistically favorable when approximation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
