Parallel Scaling Law for Language Models

Mouxiang Chen; Binyuan Hui; Zeyu Cui; Jiaxi Yang; Dayiheng Liu; Jianling Sun; Junyang Lin; Zhongxin Liu

arXiv:2505.10475·cs.LG·May 16, 2025

Parallel Scaling Law for Language Models

Mouxiang Chen, Binyuan Hui, Zeyu Cui, Jiaxi Yang, Dayiheng Liu, Jianling Sun, Junyang Lin, Zhongxin Liu

PDF

Open Access 1 Repo 10 Models

TL;DR

This paper introduces parallel scaling (ParScale), a new inference-efficient method that increases model computation through parallel transformations, enabling similar or better performance with less memory and latency than traditional parameter scaling.

Contribution

The paper proposes ParScale, a novel parallel computation paradigm for language models, along with a new scaling law validated through large-scale experiments.

Findings

01

ParScale achieves comparable performance to parameter scaling with less memory and latency.

02

A new theoretical scaling law relates parallel streams to effective parameter scaling.

03

ParScale can convert pre-trained models into parallel versions with minimal additional training.

Abstract

It is commonly believed that scaling language models should commit a significant space or time cost, by increasing the parameters (parameter scaling) or output tokens (inference-time scaling). We introduce the third and more inference-efficient scaling paradigm: increasing the model's parallel computation during both training and inference time. We apply $P$ diverse and learnable transformations to the input, execute forward passes of the model in parallel, and dynamically aggregate the $P$ outputs. This method, namely parallel scaling (ParScale), scales parallel computation by reusing existing parameters and can be applied to any model structure, optimization procedure, data, or task. We theoretically propose a new scaling law and validate it through large-scale pre-training, which shows that a model with $P$ parallel streams is similar to scaling the parameters by $O (lo g P)$ while…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

qwenlm/parscale
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling