Minimum Width of Deep Narrow Networks for Universal Approximation

Xiao-Song Yang; Qi Zhou; Xuan Zhou

arXiv:2511.06837·cs.LG·November 25, 2025

Minimum Width of Deep Narrow Networks for Universal Approximation

Xiao-Song Yang, Qi Zhou, Xuan Zhou

PDF

Open Access

TL;DR

This paper establishes bounds on the minimum width of fully connected neural networks needed for universal approximation, considering various activation functions and providing new geometric proofs.

Contribution

It provides new bounds and proofs for the minimum width of neural networks for universal approximation across different activation functions.

Findings

01

For ELU, SELU, the minimum width bound is max(2d_x+1, d_y).

02

For LeakyReLU, ELU, CELU, SELU, Softplus, the bounds are d_x+1 and d_x+d_y.

03

A new geometric proof for the lower bound when the activation is injective.

Abstract

Determining the minimum width of fully connected neural networks has become a fundamental problem in recent theoretical studies of deep neural networks. In this paper, we study the lower bounds and upper bounds of the minimum width required for fully connected neural networks in order to have universal approximation capability, which is important in network design and training. We show that $w_{min} \leq max (2 d_{x} + 1, d_{y})$ also holds true for networks with ELU, SELU activation functions, and the upper bound of this inequality is attained when $d_{y} = 2 d_{x}$ , where $d_{x}$ , $d_{y}$ denote the input and output dimensions, respectively. Besides, we show that $d_{x} + 1 \leq w_{min} \leq d_{x} + d_{y}$ for networks with LeakyReLU, ELU, CELU, SELU, Softplus activation functions, by proving that ReLU activation function can be approximated by these activation functions. In addition, in the case that the activation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Neural Networks and Applications · Machine Learning and ELM