COPUS: Co-adaptive Parallelism and Batch Size Selection in Large Language Model Training
Akhmed Sakip, Erland Hilman Fuadi, Omar Sayedelahl, Zonghang Li, Jianshu She, Alham Fikri Aji, Steve Liu, Eric Xing, Qirong Ho

TL;DR
COPUS is a system that adaptively optimizes batch size and parallelism strategies during large language model training, improving efficiency by jointly considering hardware throughput and statistical convergence.
Contribution
It introduces a novel adaptive approach that jointly tunes batch size and parallelism, outperforming fixed or independently optimized configurations.
Findings
Achieves 3.9-8.0% faster convergence times on LLM pre-training.
Supports dynamic reconfiguration of training parameters during training.
Demonstrates peak gains up to 11.1% over baselines.
Abstract
Training large language models requires jointly configuring two interdependent aspects of the system: the global batch size, which governs statistical efficiency, and the 3D parallelism strategy, which governs hardware throughput. Existing approaches make these decisions independently: optimization work adapts the batch size to track the evolving critical batch size while keeping parallelism fixed, and systems work selects the fastest parallelism for a given fixed batch size without anticipating that the optimal batch size could change. We show that these decisions are tightly coupled: the throughput-optimal parallelism strategy may shift as the global batch size changes, so any method that fixes one while adapting the other operates with a suboptimal configuration for part of the training run. We present COPUS, a system that adaptively tunes the global batch size, parallelism…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
