Learning Functions: When Is Deep Better Than Shallow

Hrushikesh Mhaskar; Qianli Liao; Tomaso Poggio

arXiv:1603.00988·cs.LG·May 31, 2016·107 cites

Learning Functions: When Is Deep Better Than Shallow

Hrushikesh Mhaskar, Qianli Liao, Tomaso Poggio

PDF

Open Access

TL;DR

This paper proves that deep networks can approximate compositional functions more efficiently than shallow ones, requiring exponentially fewer parameters, thereby settling a longstanding conjecture about the importance of depth in neural networks.

Contribution

It establishes a theoretical foundation showing deep networks' superiority in approximating compositional functions with fewer parameters and VC-dimension, confirming Bengio's conjecture.

Findings

01

Deep networks approximate compositional functions with exponentially fewer parameters.

02

Theorem confirms the advantage of depth over shallow networks in approximation efficiency.

03

Defines criteria that justify the use of deep convolutional architectures.

Abstract

While the universal approximation property holds both for hierarchical and shallow networks, we prove that deep (hierarchical) networks can approximate the class of compositional functions with the same accuracy as shallow networks but with exponentially lower number of training parameters as well as VC-dimension. This theorem settles an old conjecture by Bengio on the role of depth in networks. We then define a general class of scalable, shift-invariant algorithms to show a simple and natural set of requirements that justify deep convolutional networks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Sparse and Compressive Sensing Techniques · Machine Learning and Algorithms