A result relating convex n-widths to covering numbers with some applications to neural networks

Jonathan Baxter; Peter Bartlett

arXiv:2512.04912·cs.LG·December 5, 2025

A result relating convex n-widths to covering numbers with some applications to neural networks

Jonathan Baxter, Peter Bartlett

PDF

Open Access

TL;DR

This paper establishes a connection between convex n-widths and covering numbers, providing new bounds on the approximation capabilities of neural networks, especially one-hidden-layer models, in high-dimensional settings.

Contribution

It introduces a general relation between approximation errors and covering numbers of convex cores, with specific bounds for neural network function classes.

Findings

01

Covering numbers of neural network classes are bounded by those of their convex cores.

02

Derived upper bounds on neural network approximation rates.

03

Applicable to high-dimensional pattern recognition problems.

Abstract

In general, approximating classes of functions defined over high-dimensional input spaces by linear combinations of a fixed set of basis functions or ``features'' is known to be hard. Typically, the worst-case error of the best basis set decays only as fast as $\Theta$n^{-1/d}$$ , where $n$ is the number of basis functions and $d$ is the input dimension. However, there are many examples of high-dimensional pattern recognition problems (such as face recognition) where linear combinations of small sets of features do solve the problem well. Hence these function classes do not suffer from the ``curse of dimensionality'' associated with more general classes. It is natural then, to look for characterizations of high-dimensional function classes that nevertheless are approximated well by linear combinations of small sets of features. In this paper we give a general result relating the error…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Stochastic Gradient Optimization Techniques · Machine Learning and ELM