Multigrade Neural Network Approximation

Shijun Zhang; Zuowei Shen; Yuesheng Xu

arXiv:2601.16884·cs.LG·April 3, 2026

Multigrade Neural Network Approximation

Shijun Zhang, Zuowei Shen, Yuesheng Xu

PDF

TL;DR

This paper introduces multigrade deep learning (MGDL), a hierarchical training framework for deep neural networks that guarantees approximation error reduction through a grade-by-grade residual learning approach.

Contribution

It provides the first rigorous theoretical proof that grade-wise training of deep networks achieves vanishing approximation error, supported by operator-theoretic foundations and numerical experiments.

Findings

01

MGDL guarantees residuals decrease across grades for any continuous target function.

02

Theoretical proof shows convergence of residuals to zero in fixed-width ReLU schemes.

03

Numerical experiments validate the theoretical approximation guarantees.

Abstract

We study multigrade deep learning (MGDL) as a principled framework for structured error refinement in deep neural networks. While the approximation power of neural networks is now relatively well understood, training very deep architectures remains challenging due to highly non-convex and often ill-conditioned optimization landscapes. In contrast, for relatively shallow networks, most notably one-hidden-layer $ReLU$ models, training admits convex reformulations with global guarantees, motivating learning paradigms that improve stability while scaling to depth. MGDL builds upon this insight by training deep networks grade by grade: previously learned grades are frozen, and each new residual block is trained solely to reduce the remaining approximation error, yielding an interpretable and stable hierarchical refinement process. We develop an operator-theoretic foundation for MGDL…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.