Scalable Optimization in the Modular Norm
Tim Large, Yang Liu, Minyoung Huh, Hyojin Bahng, Phillip, Isola, Jeremy Bernstein

TL;DR
This paper introduces the modular norm, a recursive normalization method for neural network weights that improves training scalability and transferability of learning rates across architectures, supported by theoretical and practical results.
Contribution
It generalizes the natural norm to any neural network architecture, enabling optimizer normalization and theoretical analysis of gradient Lipschitz continuity.
Findings
Modular norm enables transfer of learning rates across network widths and depths.
Gradient of networks with atomic modules is Lipschitz-continuous in the modular norm.
Practical implementation available via the Modula Python package.
Abstract
To improve performance in contemporary deep learning, one is interested in scaling up the neural network in terms of both the number and the size of the layers. When ramping up the width of a single layer, graceful scaling of training has been linked to the need to normalize the weights and their updates in the "natural norm" particular to that layer. In this paper, we significantly generalize this idea by defining the modular norm, which is the natural norm on the full weight space of any neural network architecture. The modular norm is defined recursively in tandem with the network architecture itself. We show that the modular norm has several promising applications. On the practical side, the modular norm can be used to normalize the updates of any base optimizer so that the learning rate becomes transferable across width and depth. This means that the user does not need to compute…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsManufacturing Process and Optimization · Mathematical Control Systems and Analysis
MethodsBalanced Selection
