Synergistic Tensor and Pipeline Parallelism

Mengshi Qi; Jiaxuan Peng; Jie Zhang; Juan Zhu; Yong Li; Huadong Ma

arXiv:2510.27257·cs.DC·November 3, 2025

Synergistic Tensor and Pipeline Parallelism

Mengshi Qi, Jiaxuan Peng, Jie Zhang, Juan Zhu, Yong Li, Huadong Ma

PDF

Open Access 1 Video

TL;DR

This paper introduces a novel scheduling method that combines tensor and pipeline parallelism to reduce communication and synchronization overheads, significantly improving training throughput for large language models.

Contribution

It proposes a synergistic schedule that decouples and braids computation units, effectively eliminating TP bubbles and reducing PP bubbles for more efficient distributed training.

Findings

01

Up to 12% throughput improvement for LLMs

02

Up to 16% throughput improvement for MLLMs

03

Effective reduction of communication and synchronization overheads

Abstract

In the machine learning system, the hybrid model parallelism combining tensor parallelism (TP) and pipeline parallelism (PP) has become the dominant solution for distributed training of Large Language Models~(LLMs) and Multimodal LLMs (MLLMs). However, TP introduces significant collective communication overheads, while PP suffers from synchronization inefficiencies such as pipeline bubbles. Existing works primarily address these challenges from isolated perspectives, focusing either on overlapping TP communication or on flexible PP scheduling to mitigate pipeline bubbles. In this paper, we propose a new synergistic tensor and pipeline parallelism schedule that simultaneously reduces both types of bubbles. Our proposed schedule decouples the forward and backward passes in PP into fine-grained computation units, which are then braided to form a composite computation sequence. This…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Synergistic Tensor and Pipeline Parallelism· slideslive

Taxonomy

TopicsTensor decomposition and applications · Topic Modeling · Advanced Neural Network Applications