Kolmogorov Width Decay and Poor Approximators in Machine Learning: Shallow Neural Networks, Random Feature Models and Neural Tangent Kernels
Weinan E, Stephan Wojtowytsch

TL;DR
This paper demonstrates that certain neural network models and kernel methods are fundamentally limited in their ability to approximate specific function classes in high-dimensional spaces, revealing inherent inefficiencies.
Contribution
It introduces a novel scale separation technique for Kolmogorov widths and applies it to show poor approximation properties of kernel spaces and shallow networks in high dimensions.
Findings
Reproducing kernel Hilbert spaces are poor $L^2$-approximators for two-layer neural networks.
Multi-layer networks with small path norm poorly approximate certain Lipschitz functions.
The technique reveals fundamental limitations in high-dimensional function approximation.
Abstract
We establish a scale separation of Kolmogorov width type between subspaces of a given Banach space under the condition that a sequence of linear maps converges much faster on one of the subspaces. The general technique is then applied to show that reproducing kernel Hilbert spaces are poor -approximators for the class of two-layer neural networks in high dimension, and that multi-layer networks with small path norm are poor approximators for certain Lipschitz functions, also in the -topology.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Stochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques
