Pier: Efficient Large Language Model pretraining with Relaxed Global Communication

Shuyuan Fan; Zhao Zhang

arXiv:2511.17849·cs.DC·December 1, 2025

Pier: Efficient Large Language Model pretraining with Relaxed Global Communication

Shuyuan Fan, Zhao Zhang

PDF

Open Access

TL;DR

Pier introduces a scalable optimizer with relaxed global communication to significantly accelerate large language model pretraining while maintaining performance, leveraging innovative techniques and system architecture for efficient parallelization.

Contribution

The paper proposes Pier, a novel optimizer with relaxed global communication, enabling faster LLM pretraining without sacrificing model quality, and demonstrates its effectiveness on various GPT models.

Findings

01

Speeds up GPT-2 XL training by up to 3.7x on 256 GPUs

02

Reduces GPT-2 7B training time by 54.5% with combined parallel strategies

03

Maintains validation loss and downstream performance despite acceleration

Abstract

Global communication, such as all-reduce and allgather, is the prominent performance bottleneck in large language model (LLM) pretraining. To address this issue, we present Pier, an efficient and scalable optimizer with relaxed global communication. Pier is built upon DiLoCo, which leverages an inner optimizer within groups of processors and an outer optimizer that requires global communication. To preserve the convergence and model performance, Pier incorporates two key techniques for the outer optimizer: momentum warmup and momentum decay. Pier employs an efficient and scalable system architecture to enable complex parallelization strategies in LLM pretraining. We examine the model performance and runtime reduction of Pier using the GPT model family (e.g., small, medium, XL, and 7B) and the OpenWebText dataset with a suite of thirteen downstream tasks. With data parallel strategy,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Neural Network Applications