TAPAS: Fast and Automatic Derivation of Tensor Parallel Strategies for Large Neural Networks
Ziji Shi, Le Jiang, Ang Wang, Jie Zhang, Chencan Wu, Yong Li, Xiaokui Xiao, Wei Lin, Jialin Li

TL;DR
TAPAS is an automatic tensor parallelism framework that efficiently finds optimal strategies for large neural networks by exploiting substructure repetition, significantly reducing search time and matching or surpassing expert solutions.
Contribution
It introduces a divide-and-conquer approach that leverages neural network substructure repetition to enable scalable, fast automatic tensor parallel strategy derivation.
Findings
TAPAS outperforms existing frameworks by up to 160x in search speed.
Derived strategies are as good as or better than expert-engineered solutions.
The method scales efficiently to large models with sub-linear complexity.
Abstract
Tensor parallelism is an essential technique for distributed training of large neural networks. However, automatically determining an optimal tensor parallel strategy is challenging due to the gigantic search space, which grows exponentially with model size and tensor dimension. This prohibits the adoption of auto-parallel systems on larger models. We observe that neural networks usually contain repeated substructures, and build an automatic parallelism framework named TAPAS that eliminates redundant search efforts. TAPAS employs a divide-and-conquer approach that efficiently folds the search space by identifying those unique substructures. As a result, it runs at sub-linear complexity concerning the model size, making it a scalable solution for training large-scale networks. Our evaluations demonstrate that TAPAS outperforms the state-of-the-art automatic parallelism frameworks by up…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTensor decomposition and applications · Parallel Computing and Optimization Techniques · Advanced Neural Network Applications
MethodsPruning
