Oases: Efficient Large-Scale Model Training on Commodity Servers via Overlapped and Automated Tensor Model Parallelism
Shengwei Li, Zhiquan Lai, Dongsheng Li, Yanqi Hao, Weijie Liu, Keshi Ge, Xiaoge Deng, Kai Lu

TL;DR
Oases introduces an automated tensor model parallelism method with overlapped communication and computation, significantly improving large-scale model training efficiency on commodity servers.
Contribution
The paper presents Oases, a novel automated TMP approach with overlapped communication, including a fine-grained schedule and a cost-aware planner for optimal partitioning.
Findings
Achieves 1.01-1.48x speedup over state-of-the-art methods.
Up to 1.95x speedup over Megatron.
Effective on various models and commodity clusters.
Abstract
Deep learning is experiencing a rise in large-scale models. Training large-scale models is costly, prompting researchers to train large-scale models on commodity servers that more researchers can access. The massive number of parameters necessitates the use of model parallelism training methods. Existing studies focus on training with pipeline model parallelism. However, the tensor model parallelism (TMP) is inevitable when the model size keeps increasing, where frequent data-dependent communication and computation operations significantly reduce the training efficiency. In this paper, we present Oases, an automated TMP method with overlapped communication to accelerate large-scale model training on commodity servers. Oases proposes a fine-grained training operation schedule to maximize overlapping communication and computation that have data dependence. Additionally, we design the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Tensor decomposition and applications · Parallel Computing and Optimization Techniques
