ZeroPP: Unleashing Exceptional Parallelism Efficiency through   Tensor-Parallelism-Free Methodology

Ding Tang; Lijuan Jiang; Jiecheng Zhou; Minxi Jin; Hengjie Li,; Xingcheng Zhang; Zhilin Pei; Jidong Zhai

arXiv:2402.03791·cs.DC·May 27, 2024·1 cites

ZeroPP: Unleashing Exceptional Parallelism Efficiency through Tensor-Parallelism-Free Methodology

Ding Tang, Lijuan Jiang, Jiecheng Zhou, Minxi Jin, Hengjie Li,, Xingcheng Zhang, Zhilin Pei, Jidong Zhai

PDF

Open Access

TL;DR

ZeroPP is a novel distributed training framework that eliminates tensor parallelism, combining pipeline and data parallelism to improve efficiency and reduce communication overhead in large-scale model training.

Contribution

ZeroPP introduces a tensor-parallelism-free approach using hybrid pipeline and fully sharded data parallelism, simplifying implementation and enhancing performance.

Findings

01

Achieves up to 33% performance improvement over 3D parallelism.

02

Reduces memory consumption while maintaining training efficiency.

03

Demonstrates scalability on large models.

Abstract

Large-scale models rely heavily on 3D parallelism for distributed training, which utilizes tensor parallelism (TP) as the intra-operator parallelism to partition model states across GPUs. However, TP introduces significant communication overheads and complexity in modifying single-GPU code. In this paper, we propose a TP-free distributed framework ZeroPP, which leverages the hybrid of scalable inter-operator pipeline parallelism and intra-operator fully sharded data parallelism to train models at scale, reducing memory consumption and enabling high training efficiency. Through extensive experimentation, we demonstrate that ZeroPP achieves significant performance gains of up to 33% compared to conventional 3D parallelism while maintaining comparable GPU memory consumption.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSmart Grid Security and Resilience · Smart Grid Energy Management · Parallel Computing and Optimization Techniques