TL;DR
This paper uncovers special flat regions in neural network loss landscapes called channels to infinity, where certain neurons diverge but the network function converges to gated linear units, revealing new geometric and functional insights.
Contribution
It characterizes channels to infinity in neural loss landscapes, linking divergence of parameters to the emergence of gated linear units and their geometric properties.
Findings
Gradient-based optimizers frequently reach channels to infinity.
Channels resemble flat minima but involve diverging parameters.
Gated linear units naturally emerge at the end of these channels.
Abstract
The loss landscapes of neural networks contain minima and saddle points that may be connected in flat regions or appear in isolation. We identify and characterize a special structure in the loss landscape: channels along which the loss decreases extremely slowly, while the output weights of at least two neurons, and , diverge to infinity, and their input weight vectors, and , become equal to each other. At convergence, the two neurons implement a gated linear unit: . Geometrically, these channels to infinity are asymptotically parallel to symmetry-induced lines of critical points. Gradient flow solvers, and related optimization methods like SGD…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
