A Qualitative Test-Risk Mechanism for Scaling Behavior in Normalized Residual Networks
Daning Cheng, Zeyu Liu, Jun Sun, Fen Xia, Boyang Zhang, Dongping Liu, Yunquan Zhang

TL;DR
This paper provides a theoretical framework explaining how depth expansion in normalized residual networks can improve test performance, linking representational, optimization, and generalization factors.
Contribution
It introduces a unified, theorem-driven approach to analyze when residual depth expansion enhances test risk in normalized residual networks.
Findings
Proves expanded models can have lower population risk under certain conditions.
Establishes a norm-based Rademacher complexity bound for expanded models.
Provides two test-risk guarantees, one population-based and one train/test-level, for residual depth expansion.
Abstract
The scaling behavior, in which test performance often improves as model size and data increase, is a central empirical phenomenon in modern deep learning, yet its theoretical basis remains incomplete. In this paper, we study depth expansion in normalized residual networks: starting from a trained model in an old hypothesis class, we insert a new residual block at an intermediate layer and ask when such an expansion can yield a provable improvement in test risk. We develop a unified framework that decomposes this question into representational gain, optimization gain, and generalization transfer. First, under a first-order descent condition near zero initialization, we prove that the expanded hypothesis class contains an auxiliary jumpboard model with strictly smaller population risk than the original model. Second, under norm control tailored to post-normalized residual architectures,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
