MAD Max Beyond Single-Node: Enabling Large Machine Learning Model   Acceleration on Distributed Systems

Samuel Hsia; Alicia Golden; Bilge Acun; Newsha Ardalani; Zachary; DeVito; Gu-Yeon Wei; David Brooks; Carole-Jean Wu

arXiv:2310.02784·cs.DC·June 12, 2024

MAD Max Beyond Single-Node: Enabling Large Machine Learning Model Acceleration on Distributed Systems

Samuel Hsia, Alicia Golden, Bilge Acun, Newsha Ardalani, Zachary, DeVito, Gu-Yeon Wei, David Brooks, Carole-Jean Wu

PDF

Open Access

TL;DR

This paper introduces MAD-Max, a performance modeling framework that optimizes distributed training of large machine learning models, significantly reducing communication overhead and boosting throughput on GPU clusters.

Contribution

MAD-Max is a novel framework that models and optimizes parallelization strategies for large-scale ML training, enabling hardware-software co-design and substantial performance improvements.

Findings

01

Communication accounts for 14-32% of GPU hours in large model training.

02

MAD-Max achieves up to 2.24x throughput increase for pre-training.

03

MAD-Max achieves up to 5.2x throughput increase for inference.

Abstract

Training and deploying large-scale machine learning models is time-consuming, requires significant distributed computing infrastructures, and incurs high operational costs. Our analysis, grounded in real-world large model training on datacenter-scale infrastructures, reveals that 14~32% of all GPU hours are spent on communication with no overlapping computation. To minimize this outstanding communication latency and other inherent at-scale inefficiencies, we introduce an agile performance modeling framework, MAD-Max. This framework is designed to optimize parallelization strategies and facilitate hardware-software co-design opportunities. Through the application of MAD-Max to a suite of real-world large-scale ML models on state-of-the-art GPU clusters, we showcase potential throughput enhancements of up to 2.24x for pre-training and up to 5.2x for inference scenarios, respectively.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Parallel Computing and Optimization Techniques · Cloud Computing and Resource Management