Geometric Layer-wise Approximation Rates for Deep Networks

Shijun Zhang; Zuowei Shen; Yuesheng Xu

arXiv:2604.20219·cs.LG·April 23, 2026

Geometric Layer-wise Approximation Rates for Deep Networks

Shijun Zhang, Zuowei Shen, Yuesheng Xu

PDF

TL;DR

This paper develops a framework to understand how depth in neural networks enables scale-dependent approximation, with a design inspired by multigrade learning that refines residuals at multiple scales.

Contribution

It introduces a fixed-width, multi-depth neural network architecture that provides intermediate approximations at various scales, clarifying the role of depth in function approximation.

Findings

01

Approximation error at each layer is controlled by the modulus of continuity at a geometric scale.

02

The network achieves geometric convergence rates for Lipschitz functions.

03

Intermediate readouts serve as progressive refinements of the target function.

Abstract

Depth is widely viewed as a central contributor to the success of deep neural networks, whereas standard neural network approximation theory typically provides guarantees only for the final output and leaves the role of intermediate layers largely unclear. We address this gap by developing a quantitative framework in which depth admits a precise scale-dependent interpretation. Specifically, we design a single shared mixed-activation architecture of fixed width $2 d N + d + 2$ and any prescribed finite depth such that each intermediate readout $Φ_{ℓ}$ is itself an approximant to the target function $f$ . For $f \in L^{p} ([0, 1]^{d})$ with $p \in [1, \infty)$ , the approximation error of $Φ_{ℓ}$ is controlled by $(2 d + 1)$ times the $L^{p}$ modulus of continuity at the geometric scale $N^{- ℓ}$ for all $ℓ$ . The estimate reduces to the geometric rate $(2 d + 1) N^{- ℓ}$ if $f$ is $1$ -Lipschitz. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.