Frivolous Units: Wider Networks Are Not Really That Wide
Stephen Casper, Xavier Boix, Vanessa D'Amario, Ling Guo, Martin, Schrimpf, Kasper Vinken, Gabriel Kreiman

TL;DR
This paper investigates the role of 'frivolous' units in overparameterized neural networks, revealing how their proliferation explains why increasing network width does not impair accuracy and shedding light on implicit regularization.
Contribution
It identifies two types of frivolous units—prunable and redundant—that emerge with increased width and explains their influence on network complexity and regularization.
Findings
Frivolous units proliferate as network width increases.
Prunable units can be removed with minimal impact on output.
Redundant units' activities are linear combinations of others.
Abstract
A remarkable characteristic of overparameterized deep neural networks (DNNs) is that their accuracy does not degrade when the network's width is increased. Recent evidence suggests that developing compressible representations is key for adjusting the complexity of large networks to the learning task at hand. However, these compressible representations are poorly understood. A promising strand of research inspired from biology is understanding representations at the unit level as it offers a more granular and intuitive interpretation of the neural mechanisms. In order to better understand what facilitates increases in width without decreases in accuracy, we ask: Are there mechanisms at the unit level by which networks control their effective complexity as their width is increased? If so, how do these depend on the architecture, dataset, and training parameters? We identify two distinct…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMachine Learning and Data Classification · Neural Networks and Applications · Machine Learning in Materials Science
MethodsAverage Pooling · Local Response Normalization · Grouped Convolution · Dropout · Dense Connections · Softmax · How do I speak to a person at Expedia?-/+/ · *Communicated@Fast*How Do I Communicate to Expedia? · 1x1 Convolution · Batch Normalization
