Why Are Convolutional Nets More Sample-Efficient than Fully-Connected Nets?
Zhiyuan Li, Yi Zhang, Sanjeev Arora

TL;DR
This paper proves that convolutional neural networks are provably more sample-efficient than fully-connected networks on certain tasks, due to their inherent symmetry properties and the role of training algorithms.
Contribution
It establishes a mathematical sample complexity gap between convolutional and fully-connected networks for standard training algorithms on a natural task.
Findings
Convolutional nets require only O(1) samples for certain tasks.
Fully-connected nets need Ω(d^2) samples due to orthogonal invariance.
Results extend to various training algorithms like Adam and AdaGrad.
Abstract
Convolutional neural networks often dominate fully-connected counterparts in generalization performance, especially on image classification tasks. This is often explained in terms of 'better inductive bias'. However, this has not been made mathematically rigorous, and the hurdle is that the fully connected net can always simulate the convolutional net (for a fixed task). Thus the training algorithm plays a role. The current work describes a natural task on which a provable sample complexity gap can be shown, for standard training algorithms. We construct a single natural distribution on on which any orthogonal-invariant algorithm (i.e. fully-connected networks trained with most gradient-based methods from gaussian initialization) requires samples to generalize while samples suffice for convolutional architectures. Furthermore, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMachine Learning and Algorithms · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications
MethodsAdaGrad · Stochastic Gradient Descent · Adam
