Revisiting LARS for Large Batch Training Generalization of Neural   Networks

Khoi Do; Duong Nguyen; Hoa Nguyen; Long Tran-Thanh; Nguyen-Hoang Tran,; and Quoc-Viet Pham

arXiv:2309.14053·cs.LG·August 28, 2024

Revisiting LARS for Large Batch Training Generalization of Neural Networks

Khoi Do, Duong Nguyen, Hoa Nguyen, Long Tran-Thanh, Nguyen-Hoang Tran,, and Quoc-Viet Pham

PDF

Open Access

TL;DR

This paper introduces TVLARS, a new large batch training algorithm that replaces warm-up with a sigmoid-like function, improving neural network training and generalization, especially in self-supervised learning.

Contribution

We propose TVLARS, a novel adaptive scaling method that enhances large batch training by replacing warm-up with a configurable function, leading to better exploration and generalization.

Findings

01

TVLARS outperforms LARS and LAMB in classification tasks by up to 2%.

02

In self-supervised learning, TVLARS surpasses competitors with up to 10% improvement.

03

Replacing warm-up with a sigmoid-like function enhances training robustness.

Abstract

This paper explores Large Batch Training techniques using layer-wise adaptive scaling ratio (LARS) across diverse settings, uncovering insights. LARS algorithms with warm-up tend to be trapped in sharp minimizers early on due to redundant ratio scaling. Additionally, a fixed steep decline in the latter phase restricts deep neural networks from effectively navigating early-phase sharp minimizers. Building on these findings, we propose Time Varying LARS (TVLARS), a novel algorithm that replaces warm-up with a configurable sigmoid-like function for robust training in the initial phase. TVLARS promotes gradient exploration early on, surpassing sharp optimizers and gradually transitioning to LARS for robustness in later phases. Extensive experiments demonstrate that TVLARS consistently outperforms LARS and LAMB in most cases, with up to 2\% improvement in classification scenarios. Notably,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and ELM · Neural Networks and Applications · Domain Adaptation and Few-Shot Learning

MethodsLARS · Adam · LAMB