Scaling Law for Language Models Training Considering Batch Size

Xian Shuai; Yiding Wang; Yimeng Wu; Xin Jiang; Xiaozhe Ren

arXiv:2412.01505·cs.CL·December 3, 2024

Scaling Law for Language Models Training Considering Batch Size

Xian Shuai, Yiding Wang, Yimeng Wu, Xin Jiang, Xiaozhe Ren

PDF

Open Access

TL;DR

This paper empirically investigates how global batch size impacts large language model training, establishing scaling laws that guide optimization under resource constraints.

Contribution

It introduces new empirical scaling laws relating batch size, model size, and training data, validated through extensive experiments on models up to 2.6 billion parameters.

Findings

01

Batch size significantly influences convergence and generalization.

02

Scaling laws enable resource-efficient training strategies.

03

Extrapolation validates the predictive power of the proposed laws.

Abstract

Large language models (LLMs) have made remarkable advances in recent years, with scaling laws playing a critical role in this rapid progress. In this paper, we empirically investigate how a critical hyper-parameter, i.e., the global batch size, influences the LLM training prdocess. We begin by training language models ranging from 125 million to 2.6 billion parameters, using up to 300 billion high-quality tokens. Through these experiments, we establish a basic scaling law on model size and training data amount. We then examine how varying batch sizes and learning rates affect the convergence and generalization of these models. Our analysis yields batch size scaling laws under two different cases: with a fixed compute budget, and with a fixed amount of training data. Extrapolation experiments on models of increasing sizes validate our predicted laws, which provides guidance for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling