Characterizing Compute-Communication Overlap in GPU-Accelerated Distributed Deep Learning: Performance and Power Implications

Seonho Lee; Jihwan Oh; Junkyum Kim; Seokjin Go; Jongse Park; Divya Mahajan

arXiv:2507.03114·cs.DC·July 8, 2025

Characterizing Compute-Communication Overlap in GPU-Accelerated Distributed Deep Learning: Performance and Power Implications

Seonho Lee, Jihwan Oh, Junkyum Kim, Seokjin Go, Jongse Park, Divya Mahajan

PDF

TL;DR

This paper systematically evaluates GPU-accelerated distributed training, revealing that overlapping compute and communication can cause significant slowdowns and increased power consumption, highlighting complex trade-offs in optimization strategies.

Contribution

It provides a comprehensive analysis of the effects of overlapping strategies on performance and power in GPU-based distributed deep learning, considering hardware features and power capping.

Findings

01

Overlapping compute and communication can slow down training by up to 40%.

02

Sequential execution is generally more efficient than overlapping in certain scenarios.

03

Power consumption can increase due to resource contention during overlapping execution.

Abstract

This paper provides an in-depth characterization of GPU-accelerated systems, to understand the interplay between overlapping computation and communication which is commonly employed in distributed training settings. Due to the large size of models, distributing them across multiple devices is required. Overlapping strategies, which enable concurrent computation and communication, are critical for mitigating communication bottlenecks and maximizing GPU utilization. However, the current consensus is that we should always and aggressively overlap compute and communication to mitigate the overhead of distribution. By systematically evaluating state-of-the-art GPUs, this study investigates the impact of hardware features such as numeric precision, specialized cores, and power capping on distributed training workloads. Comprehensive experiments and studies showcase the effects of overlapping…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.