On the Expressive Power of Neural Networks

Jan Holstermann

arXiv:2306.00145·math.CA·June 2, 2023·2 cites

On the Expressive Power of Neural Networks

Jan Holstermann

PDF

Open Access

TL;DR

This paper investigates the expressive capabilities of neural networks, comparing shallow and deep architectures across various norms and activation functions, and introduces a new framework to analyze their approximation power.

Contribution

It introduces a novel framework with two measures of expressive power, improving bounds on linear regions and addressing open questions about network approximation capabilities.

Findings

01

Improved bounds on the number of linear regions in ReLU networks.

02

Analysis of approximation limits between shallow and deep networks.

03

Extension of universal approximation results to new norms and activation functions.

Abstract

In 1989 George Cybenko proved in a landmark paper that wide shallow neural networks can approximate arbitrary continuous functions on a compact set. This universal approximation theorem sparked a lot of follow-up research. Shen, Yang and Zhang determined optimal approximation rates for ReLU-networks in $L^{p}$ -norms with $p \in [1, \infty)$ . Kidger and Lyons proved a universal approximation theorem for deep narrow ReLU-networks. Telgarsky gave an example of a deep narrow ReLU-network that cannot be approximated by a wide shallow ReLU-network unless it has exponentially many neurons. However, there are even more questions that still remain unresolved. Are there any wide shallow ReLU-networks that cannot be approximated well by deep narrow ReLU-networks? Is the universal approximation theorem still true for other norms like the Sobolev norm $W^{1, 1}$ ? Do these results hold for activation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Neural Networks and Applications · Stochastic Gradient Optimization Techniques