Approximation Theory for Neural Networks: Old and New
Soumendu Sundar Mukherjee, Himasish Talukdar

TL;DR
This survey reviews the evolution of approximation theory for neural networks, highlighting classical results, quantitative bounds, depth-width trade-offs, and recent developments on Kolmogorov--Arnold Networks.
Contribution
It provides a comprehensive overview of both classical and recent approximation results, emphasizing depth--width trade-offs and alternative architectures like KANs.
Findings
Deeper networks can achieve better approximation efficiency for structured functions.
Quantitative bounds relate approximation error to network size and function smoothness.
Recent attention on Kolmogorov--Arnold Networks expands the theoretical landscape.
Abstract
Universal approximation theorems provide a mathematical explanation for the expressive power of neural networks. They assert that, under mild conditions on the activation function, feedforward neural networks are dense in broad function classes, such as continuous functions on compact subsets of , spaces, or Sobolev spaces. Over the past four decades, these qualitative universality results have evolved into a rich quantitative theory addressing approximation rates, parameter efficiency, and the role of architectural features such as depth and width. This survey presents several glimpses into this theory. We review classical density results for single-hidden-layer networks, as well as quantitative bounds that relate approximation error to network size and smoothness assumptions on target functions. Particular emphasis is placed on depth--width trade-offs and on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
