Revisiting LARS for Large Batch Training Generalization of Neural Networks
Khoi Do, Duong Nguyen, Hoa Nguyen, Long Tran-Thanh, Nguyen-Hoang Tran,, and Quoc-Viet Pham

TL;DR
This paper introduces TVLARS, a new large batch training algorithm that replaces warm-up with a sigmoid-like function, improving neural network training and generalization, especially in self-supervised learning.
Contribution
We propose TVLARS, a novel adaptive scaling method that enhances large batch training by replacing warm-up with a configurable function, leading to better exploration and generalization.
Findings
TVLARS outperforms LARS and LAMB in classification tasks by up to 2%.
In self-supervised learning, TVLARS surpasses competitors with up to 10% improvement.
Replacing warm-up with a sigmoid-like function enhances training robustness.
Abstract
This paper explores Large Batch Training techniques using layer-wise adaptive scaling ratio (LARS) across diverse settings, uncovering insights. LARS algorithms with warm-up tend to be trapped in sharp minimizers early on due to redundant ratio scaling. Additionally, a fixed steep decline in the latter phase restricts deep neural networks from effectively navigating early-phase sharp minimizers. Building on these findings, we propose Time Varying LARS (TVLARS), a novel algorithm that replaces warm-up with a configurable sigmoid-like function for robust training in the initial phase. TVLARS promotes gradient exploration early on, surpassing sharp optimizers and gradually transitioning to LARS for robustness in later phases. Extensive experiments demonstrate that TVLARS consistently outperforms LARS and LAMB in most cases, with up to 2\% improvement in classification scenarios. Notably,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and ELM · Neural Networks and Applications · Domain Adaptation and Few-Shot Learning
MethodsLARS · Adam · LAMB
