Logit Attenuating Weight Normalization
Aman Gupta, Rohan Ramanath, Jun Shi, Anika Ramachandran, Sirou Zhou,, Mingzhou Zhou, S. Sathiya Keerthi

TL;DR
This paper introduces LAWN, a normalization technique that constrains layer weight norms to improve the adaptivity and generalization of deep networks, especially with large batch training.
Contribution
LAWN is a novel method that can be added to any optimizer to control logits by constraining layer weight norms, enhancing training adaptability and generalization.
Findings
LAWN improves generalization in large-scale image classification.
LAWN enhances optimizer performance with large batch sizes.
LAWN significantly boosts Adam's effectiveness in training deep networks.
Abstract
Over-parameterized deep networks trained using gradient-based optimizers are a popular choice for solving classification and ranking problems. Without appropriately tuned regularization or weight decay, such networks have the tendency to make output scores (logits) and network weights large, causing training loss to become too small and the network to lose its adaptivity (ability to move around) in the parameter space. Although regularization is typically understood from an overfitting perspective, we highlight its role in making the network more adaptive and enabling it to escape more easily from weights that generalize poorly. To provide such a capability, we propose a method called Logit Attenuating Weight Normalization (LAWN), that can be stacked onto any gradient-based optimizer. LAWN controls the logits by constraining the weight norms of layers in the final homogeneous…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Machine Learning and Data Classification · Advanced Image and Video Retrieval Techniques
MethodsWeight Normalization · Adam
