The Power of Depth for Feedforward Neural Networks

Ronen Eldan; Ohad Shamir

arXiv:1512.03965·cs.LG·May 10, 2016·216 cites

The Power of Depth for Feedforward Neural Networks

Ronen Eldan, Ohad Shamir

PDF

Open Access

TL;DR

This paper demonstrates that depth in feedforward neural networks can exponentially increase expressive power, showing certain functions are unapproximable by shallow networks unless they are exponentially wide.

Contribution

It provides a formal proof that depth adds exponential expressiveness to neural networks, applicable to common activation functions, with a novel approach compared to Boolean function results.

Findings

01

Depth-3 networks can represent functions that 2-layer networks cannot approximate without exponential width.

02

The result applies to standard activation functions like ReLUs, sigmoids, and thresholds.

03

Depth increases expressiveness exponentially, even with minimal additional layers.

Abstract

We show that there is a simple (approximately radial) function on $R^{d}$ , expressible by a small 3-layer feedforward neural networks, which cannot be approximated by any 2-layer network, to more than a certain constant accuracy, unless its width is exponential in the dimension. The result holds for virtually all known activation functions, including rectified linear units, sigmoids and thresholds, and formally demonstrates that depth -- even if increased by 1 -- can be exponentially more valuable than width for standard feedforward neural networks. Moreover, compared to related results in the context of Boolean functions, our result requires fewer assumptions, and the proof techniques and construction are very different.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Neural Networks and Applications · Stochastic Gradient Optimization Techniques