ATP: Adaptive Tensor Parallelism for Foundation Models
Shenggan Cheng, Ziming Liu, Jiangsu Du, Yang You

TL;DR
ATP introduces an adaptive tensor parallelism framework that automatically optimizes parallel strategies for foundation model training, significantly improving performance across various hardware configurations.
Contribution
The paper presents ATP, a novel framework that automatically selects optimal tensor parallel strategies based on interconnection topologies, enhancing training efficiency for large models.
Findings
Achieves up to 64% training performance improvement.
Effectively reduces communication overhead with scaling.
Outperforms existing tensor parallelism methods across different setups.
Abstract
Foundation models have impressive performance and generalization capabilities across a wide range of applications. The increasing size of the models introduces great challenges for the training. Tensor parallelism is a critical technique that is currently used in almost all foundation model training and has a significant impact on overall training performance. However, current tensor parallelism in machine learning frameworks misses optimization opportunities in fitting various interconnection topologies. In this work, we present ATP, an adaptive tensor parallelism framework for foundation models, which can automatically select the optimal parallel strategy on different interconnections. We propose column- and row-first tensor parallelism based on 2D device meshes and construct a search space. Combined with the hierarchical communication matrix, ATP can identify the optimal strategy in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTensor decomposition and applications · Parallel Computing and Optimization Techniques · Educational Methods and Media Use
