Accelerating Large Language Model Training with Hybrid GPU-based   Compression

Lang Xu; Quentin Anthony; Qinghua Zhou; Nawras Alnaasan; Radha R.; Gulhane; Aamir Shafi; Hari Subramoni; Dhabaleswar K. Panda

arXiv:2409.02423·cs.DC·September 5, 2024

Accelerating Large Language Model Training with Hybrid GPU-based Compression

Lang Xu, Quentin Anthony, Qinghua Zhou, Nawras Alnaasan, Radha R., Gulhane, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda

PDF

Open Access

TL;DR

This paper explores GPU-based compression techniques integrated with MPI collectives to accelerate large language model training across distributed systems, achieving significant efficiency gains without sacrificing accuracy.

Contribution

It introduces a hybrid compression scheme tailored for different parallelism dimensions in LLM training, optimizing communication and improving training throughput.

Findings

01

22.5% increase in TFLOPS per GPU with naive compression

02

17.3% increase in TFLOPS per GPU with hybrid compression

03

Training loss convergence maintained with improved efficiency

Abstract

Data Parallelism (DP), Tensor Parallelism (TP), and Pipeline Parallelism (PP) are the three strategies widely adopted to enable fast and efficient Large Language Model (LLM) training. However, these approaches rely on data-intensive communication routines to collect, aggregate, and re-distribute gradients, activations, and other important model information, which pose significant overhead. Co-designed with GPU-based compression libraries, MPI libraries have been proven to reduce message size significantly, and leverage interconnect bandwidth, thus increasing training efficiency while maintaining acceptable accuracy. In this work, we investigate the efficacy of compression-assisted MPI collectives under the context of distributed LLM training using 3D parallelism and ZeRO optimizations. We scaled up to 192 V100 GPUs on the Lassen supercomputer. First, we enabled a na\"ive compression…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis

MethodsZeRO