Super-fast Rates of Convergence for Neural Network Classifiers under the Hard Margin Condition

Nathanael Tepakbong; Xiang Zhou; Ding-Xuan Zhou

arXiv:2505.08262·cs.LG·May 6, 2026

Super-fast Rates of Convergence for Neural Network Classifiers under the Hard Margin Condition

Nathanael Tepakbong, Xiang Zhou, Ding-Xuan Zhou

PDF

TL;DR

This paper proves that deep neural network classifiers can achieve near-optimal fast convergence rates under low-noise and hard-margin conditions, with rates depending on the smoothness of the regression function.

Contribution

It establishes new excess risk bounds for DNN classifiers under Tsybakov's low-noise and hard-margin conditions, including a novel risk decomposition technique.

Findings

01

Achieves excess risk bounds of order n^{-eta} with eta close to 1 under low-noise conditions.

02

Attains arbitrarily fast rates under the hard-margin condition for certain activation functions.

03

Provides minimax lower bounds showing the optimality of these rates for q ≥ 2.

Abstract

We study the classical binary classification problem for hypothesis spaces of Deep Neural Networks (DNNs) under Tsybakov's low-noise condition with exponent $q > 0$ , as well as its limit case $q = \infty$ , which we refer to as the \emph{hard margin condition}. We demonstrate that, for a wide range of commonly used activation functions (including but not limited to ReLU, LeakyReLU, ELU, CELU, SELU, Softplus, GELU, SiLU, Swish, Mish, and Softmax), DNN solutions to the empirical risk minimization (ERM) problem with square loss surrogate and $ℓ_{p}$ penalty on the weights $(0 < p < \infty)$ can achieve excess risk bounds of order $O (n^{- α})$ for $α$ close to $1$ under the low-noise condition, and for arbitrarily large $α > 1$ under the hard-margin condition, provided that the Bayes regression function $η$ satisfies a \emph{distribution-adapted smoothness}…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.