Multigrade Neural Network Approximation
Shijun Zhang, Zuowei Shen, Yuesheng Xu

TL;DR
This paper introduces multigrade deep learning (MGDL), a hierarchical training framework for deep neural networks that guarantees approximation error reduction through a grade-by-grade residual learning approach.
Contribution
It provides the first rigorous theoretical proof that grade-wise training of deep networks achieves vanishing approximation error, supported by operator-theoretic foundations and numerical experiments.
Findings
MGDL guarantees residuals decrease across grades for any continuous target function.
Theoretical proof shows convergence of residuals to zero in fixed-width ReLU schemes.
Numerical experiments validate the theoretical approximation guarantees.
Abstract
We study multigrade deep learning (MGDL) as a principled framework for structured error refinement in deep neural networks. While the approximation power of neural networks is now relatively well understood, training very deep architectures remains challenging due to highly non-convex and often ill-conditioned optimization landscapes. In contrast, for relatively shallow networks, most notably one-hidden-layer models, training admits convex reformulations with global guarantees, motivating learning paradigms that improve stability while scaling to depth. MGDL builds upon this insight by training deep networks grade by grade: previously learned grades are frozen, and each new residual block is trained solely to reduce the remaining approximation error, yielding an interpretable and stable hierarchical refinement process. We develop an operator-theoretic foundation for MGDL…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
