Big Neural Networks Waste Capacity
Yann N. Dauphin, Yoshua Bengio

TL;DR
This paper investigates why large neural networks often do not fully utilize their capacity to reduce underfitting, highlighting the limitations of first-order gradient descent in this regime and proposing potential solutions.
Contribution
It identifies the failure of current optimization methods to leverage the capacity of large neural networks, suggesting new directions for improving their generalization.
Findings
Diminishing returns in training error with increasing network size.
First-order gradient descent may be ineffective for large capacity regimes.
Potential for improved generalization through alternative optimization or parametrization.
Abstract
This article exposes the failure of some big neural networks to leverage added capacity to reduce underfitting. Past research suggest diminishing returns when increasing the size of neural networks. Our experiments on ImageNet LSVRC-2010 show that this may be due to the fact there are highly diminishing returns for capacity in terms of training error, leading to underfitting. This suggests that the optimization method - first order gradient descent - fails at this regime. Directly attacking this problem, either through the optimization method or the choices of parametrization, may allow to improve the generalization error on large datasets, for which a large capacity is required.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Generative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications
