The Implicit Bias of Gradient Descent on Generalized Gated Linear Networks
Samuel Lippl, L. F. Abbott, SueYeon Chung

TL;DR
This paper analyzes the long-term behavior of gradient descent on gated linear networks, revealing how implicit biases influence learning and performance, with implications for designing better neural network architectures.
Contribution
It derives the infinite-time training limit for gated linear networks and generalizes these results to networks with homogeneous polynomial activations, linking theory to practical MNIST experiments.
Findings
Theoretical predictions match empirical results on MNIST.
Implicit bias significantly influences network performance.
Framework captures key aspects of ReLU network biases.
Abstract
Understanding the asymptotic behavior of gradient-descent training of deep neural networks is essential for revealing inductive biases and improving network performance. We derive the infinite-time training limit of a mathematically tractable class of deep nonlinear neural networks, gated linear networks (GLNs), and generalize these results to gated networks described by general homogeneous polynomials. We study the implications of our results, focusing first on two-layer GLNs. We then apply our theoretical predictions to GLNs trained on MNIST and show how architectural constraints and the implicit bias of gradient descent affect performance. Finally, we show that our theory captures a substantial portion of the inductive bias of ReLU networks. By making the inductive bias explicit, our framework is poised to inform the development of more efficient, biologically plausible, and robust…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and ELM · Stochastic Gradient Optimization Techniques · Neural Networks and Applications
