It's Not a Lottery, It's a Race: Understanding How Gradient Descent Adapts the Network's Capacity to the Task
Hannah Pinson

TL;DR
This paper investigates how gradient descent dynamically reduces neural network capacity during training, explaining phenomena like neuron merging, pruning, and the lottery ticket hypothesis through three key principles.
Contribution
It introduces three dynamical principles—mutual alignment, unlocking, and racing—that explain capacity reduction and neuron specialization during training.
Findings
Identifies three principles explaining capacity reduction in training.
Explains the lottery ticket phenomenon through neuron dynamics.
Provides a theoretical framework for neuron merging and pruning.
Abstract
Our theoretical understanding of neural networks is lagging behind their empirical success. One of the important unexplained phenomena is why and how, during the process of training with gradient descent, the theoretical capacity of neural networks is reduced to an effective capacity that fits the task. We here investigate the mechanism by which gradient descent achieves this through analyzing the learning dynamics at the level of individual neurons in single hidden layer ReLU networks. We identify three dynamical principles, namely mutual alignment, unlocking and racing, that together explain why we can often successfully reduce capacity after training through the merging of equivalent neurons or the pruning of low norm weights. We specifically explain the mechanism behind the lottery ticket conjecture, or why the specific, beneficial initial conditions of some neurons lead them to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
