A Qualitative Test-Risk Mechanism for Scaling Behavior in Normalized Residual Networks

Daning Cheng; Zeyu Liu; Jun Sun; Fen Xia; Boyang Zhang; Dongping Liu; Yunquan Zhang

arXiv:2605.08297·cs.LG·May 12, 2026

A Qualitative Test-Risk Mechanism for Scaling Behavior in Normalized Residual Networks

Daning Cheng, Zeyu Liu, Jun Sun, Fen Xia, Boyang Zhang, Dongping Liu, Yunquan Zhang

PDF

TL;DR

This paper provides a theoretical framework explaining how depth expansion in normalized residual networks can improve test performance, linking representational, optimization, and generalization factors.

Contribution

It introduces a unified, theorem-driven approach to analyze when residual depth expansion enhances test risk in normalized residual networks.

Findings

01

Proves expanded models can have lower population risk under certain conditions.

02

Establishes a norm-based Rademacher complexity bound for expanded models.

03

Provides two test-risk guarantees, one population-based and one train/test-level, for residual depth expansion.

Abstract

The scaling behavior, in which test performance often improves as model size and data increase, is a central empirical phenomenon in modern deep learning, yet its theoretical basis remains incomplete. In this paper, we study depth expansion in normalized residual networks: starting from a trained model in an old hypothesis class, we insert a new residual block at an intermediate layer and ask when such an expansion can yield a provable improvement in test risk. We develop a unified framework that decomposes this question into representational gain, optimization gain, and generalization transfer. First, under a first-order descent condition near zero initialization, we prove that the expanded hypothesis class contains an auxiliary jumpboard model with strictly smaller population risk than the original model. Second, under norm control tailored to post-normalized residual architectures,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.