TL;DR
This paper introduces a theoretical framework for residual networks as progressive, layer-wise approximation processes, and proposes a training principle enabling multi-depth inference and efficient deployment.
Contribution
It formalizes the concept of progressive approximation trajectories in residual networks and introduces Layer-wise Progressive Approximation (LPA), a new training method applicable across architectures.
Findings
Progressive trajectories exist where error decreases monotonically with depth.
LPA aligns each layer with its residual target, enabling multi-depth useful predictions.
Progressive behavior observed across residual FNNs, ResNets, and Transformers in various tasks.
Abstract
The Universal Approximation Theorem (UAT) guarantees universal function approximation but does not explain how residual models distribute approximation across layers. We reframe residual networks as a layer-wise approximation process that builds an approximation trajectory from input to target, and prove the existence of progressive trajectories where error decreases monotonically with depth. It reveals that residual networks can implement structured, step-by-step refinement rather than end-to-end (E2E) black-box mapping. Building on this, we propose Layer-wise Progressive Approximation (LPA), a theoretically grounded training principle that explicitly aligns each layer with its residual target to realize such trajectories. LPA is architecture-agnostic: we observe progressive behavior in residual FNNs, ResNets, and Transformers across tasks including complex surface fitting, image…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
