Tilting the playing field: Dynamical loss functions for machine learning
Miguel Ruiz-Garcia, Ge Zhang, Samuel S. Schoenholz, Andrea J. Liu

TL;DR
This paper introduces cyclically evolving loss functions that improve training success in underparameterized networks and enhance generalization in overparameterized networks by dynamically altering the loss landscape during training.
Contribution
It proposes a novel approach of dynamical loss functions that oscillate during training, leading to improved optimization and generalization, supported by analysis of bifurcation cascades and landscape dynamics.
Findings
Dynamical loss functions enable successful training where standard losses fail.
Oscillating loss functions lead to wider and deeper valleys in the loss landscape.
Networks trained with these methods show improved generalization performance.
Abstract
We show that learning can be improved by using loss functions that evolve cyclically during training to emphasize one class at a time. In underparameterized networks, such dynamical loss functions can lead to successful training for networks that fail to find a deep minima of the standard cross-entropy loss. In overparameterized networks, dynamical loss functions can lead to better generalization. Improvement arises from the interplay of the changing loss landscape with the dynamics of the system as it evolves to minimize the loss. In particular, as the loss function oscillates, instabilities develop in the form of bifurcation cascades, which we study using the Hessian and Neural Tangent Kernel. Valleys in the landscape widen and deepen, and then narrow and rise as the loss landscape changes during a cycle. As the landscape narrows, the learning rate becomes too large and the network…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNeural Networks and Applications · Neural Networks and Reservoir Computing
