Why Are Convolutional Nets More Sample-Efficient than Fully-Connected   Nets?

Zhiyuan Li; Yi Zhang; Sanjeev Arora

arXiv:2010.08515·cs.LG·May 5, 2021·22 cites

Why Are Convolutional Nets More Sample-Efficient than Fully-Connected Nets?

Zhiyuan Li, Yi Zhang, Sanjeev Arora

PDF

Open Access 1 Video

TL;DR

This paper proves that convolutional neural networks are provably more sample-efficient than fully-connected networks on certain tasks, due to their inherent symmetry properties and the role of training algorithms.

Contribution

It establishes a mathematical sample complexity gap between convolutional and fully-connected networks for standard training algorithms on a natural task.

Findings

01

Convolutional nets require only O(1) samples for certain tasks.

02

Fully-connected nets need Ω(d^2) samples due to orthogonal invariance.

03

Results extend to various training algorithms like Adam and AdaGrad.

Abstract

Convolutional neural networks often dominate fully-connected counterparts in generalization performance, especially on image classification tasks. This is often explained in terms of 'better inductive bias'. However, this has not been made mathematically rigorous, and the hurdle is that the fully connected net can always simulate the convolutional net (for a fixed task). Thus the training algorithm plays a role. The current work describes a natural task on which a provable sample complexity gap can be shown, for standard training algorithms. We construct a single natural distribution on $R^{d} \times {\pm 1}$ on which any orthogonal-invariant algorithm (i.e. fully-connected networks trained with most gradient-based methods from gaussian initialization) requires $Ω (d^{2})$ samples to generalize while $O (1)$ samples suffice for convolutional architectures. Furthermore, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Why Are Convolutional Nets More Sample-Efficient than Fully-Connected Nets?· slideslive

Taxonomy

TopicsMachine Learning and Algorithms · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications

MethodsAdaGrad · Stochastic Gradient Descent · Adam