Path-conditioned training: a principled way to rescale ReLU neural networks
Arthur Lebeurrier, Titouan Vayer, R\'emi Gribonval

TL;DR
This paper introduces a principled rescaling method for ReLU neural networks based on path-lifting, which improves training efficiency by aligning network parameters with a reference kernel.
Contribution
It presents a geometrically motivated rescaling criterion and an efficient algorithm, leveraging the path-lifting framework to enhance neural network training.
Findings
Rescaling can significantly speed up training.
The method effectively aligns kernels in the path-lifting space.
Architecture and initialization scale influence the rescaling effectiveness.
Abstract
Despite recent algorithmic advances, we still lack principled ways to leverage the well-documented rescaling symmetries in ReLU neural network parameters. While two properly rescaled weights implement the same function, the training dynamics can be dramatically different. To offer a fresh perspective on exploiting this phenomenon, we build on the recent path-lifting framework, which provides a compact factorization of ReLU networks. We introduce a geometrically motivated criterion to rescale neural network parameters which minimization leads to a conditioning strategy that aligns a kernel in the path-lifting space with a chosen reference. We derive an efficient algorithm to perform this alignment. In the context of random network initialization, we analyze how the architecture and the initialization scale jointly impact the output of the proposed method. Numerical experiments illustrate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and ELM · Stochastic Gradient Optimization Techniques · Model Reduction and Neural Networks
