Merak: An Efficient Distributed DNN Training Framework with Automated 3D   Parallelism for Giant Foundation Models

Zhiquan Lai; Shengwei Li; Xudong Tang; Keshi Ge; Weijie Liu; Yabo; Duan; Linbo Qiao; Dongsheng Li

arXiv:2206.04959·cs.LG·March 22, 2023·1 cites

Merak: An Efficient Distributed DNN Training Framework with Automated 3D Parallelism for Giant Foundation Models

Zhiquan Lai, Shengwei Li, Xudong Tang, Keshi Ge, Weijie Liu, Yabo, Duan, Linbo Qiao, Dongsheng Li

PDF

Open Access 1 Repo

TL;DR

Merak is an automated, resource-efficient 3D parallelism framework for training large foundation models, reducing manual effort and improving training speed on GPU clusters.

Contribution

It introduces an automated model partitioner and a high-performance runtime engine that enhance resource utilization and simplify distributed training of giant models.

Findings

01

Achieves up to 1.61x speedup over state-of-the-art frameworks.

02

Automates model parallelism with minimal code modifications.

03

Effectively utilizes GPU resources and overlaps communication with computation.

Abstract

Foundation models are becoming the dominant deep learning technologies. Pretraining a foundation model is always time-consumed due to the large scale of both the model parameter and training dataset. Besides being computing-intensive, the training process is extremely memory-intensive and communication-intensive. These features make it necessary to apply 3D parallelism, which integrates data parallelism, pipeline model parallelism and tensor model parallelism, to achieve high training efficiency. To achieve this goal, some custom software frameworks such as Megatron-LM and DeepSpeed are developed. However, current 3D parallelism frameworks still meet two issues: i) they are not transparent to model developers, which need to manually modify the model to parallelize training. ii) their utilization of computation, GPU memory and network bandwidth are not sufficient. We propose Merak, an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hpdl-group/merak
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Parallel Computing and Optimization Techniques · Graph Theory and Algorithms