Geometric Layer-wise Approximation Rates for Deep Networks
Shijun Zhang, Zuowei Shen, Yuesheng Xu

TL;DR
This paper develops a framework to understand how depth in neural networks enables scale-dependent approximation, with a design inspired by multigrade learning that refines residuals at multiple scales.
Contribution
It introduces a fixed-width, multi-depth neural network architecture that provides intermediate approximations at various scales, clarifying the role of depth in function approximation.
Findings
Approximation error at each layer is controlled by the modulus of continuity at a geometric scale.
The network achieves geometric convergence rates for Lipschitz functions.
Intermediate readouts serve as progressive refinements of the target function.
Abstract
Depth is widely viewed as a central contributor to the success of deep neural networks, whereas standard neural network approximation theory typically provides guarantees only for the final output and leaves the role of intermediate layers largely unclear. We address this gap by developing a quantitative framework in which depth admits a precise scale-dependent interpretation. Specifically, we design a single shared mixed-activation architecture of fixed width and any prescribed finite depth such that each intermediate readout is itself an approximant to the target function . For with , the approximation error of is controlled by times the modulus of continuity at the geometric scale for all . The estimate reduces to the geometric rate if is -Lipschitz. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
