Why and When Deep is Better than Shallow: Implementation-Agnostic State-Transition Model of Deep Learning

Sho Sonoda; Yuka Hashimoto; Isao Ishikawa; Masahiro Ikeda

arXiv:2505.15064·cs.LG·May 8, 2026

Why and When Deep is Better than Shallow: Implementation-Agnostic State-Transition Model of Deep Learning

Sho Sonoda, Yuka Hashimoto, Isao Ishikawa, Masahiro Ikeda

PDF

TL;DR

This paper analyzes the conditions under which increased depth in neural networks improves generalization, using an implementation-agnostic state-transition model to identify mechanisms that favor deep learning.

Contribution

It introduces a theoretical framework that separates errors and provides bounds, clarifying when depth offers statistical advantages in neural networks.

Findings

01

Depth improves generalization when approximation improves rapidly and the transition semigroup is geometrically tame.

02

Identifies mechanisms that keep entropy contributions saturated or polynomial, affecting depth benefits.

03

Contrasts mechanisms that saturate entropy with those that recover classical exponential-growth obstruction.

Abstract

Why and when does depth improve generalization? We study this question in an implementation-agnostic state-transition model, where a depth- $k$ predictor is a readout class $H$ composed with the word ball $B (k, F)$ generated by hidden state transitions. Generalization bounds separate implementation error, approximation error, and statistical complexity, and upper bound the depth-dependent variance term by a Dudley entropy integral over $B (k, F)$ , with a conditional lower-bound diagnostic under readout separation. We identify geometric and semigroup mechanisms that keep this entropy contribution saturated or polynomial, and contrast them with separation mechanisms that recover the classical exponential-growth obstruction. Coupling these variance upper bounds with approximation rates gives typical depth trade-off patterns, clarifying that depth is statistically favorable when approximation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.