Design Space Exploration of DMA based Finer-Grain Compute Communication Overlap
Shagnik Pal, Shaizeen Aga, Suchita Pati, Mahzabeen Islam, Lizy K. John

TL;DR
This paper introduces FiCCO, a finer-grain compute-communication overlap technique for distributed ML, which enhances performance by exploring deeper scheduling options and offloading communication to GPU DMA engines.
Contribution
It proposes a novel FiCCO approach for deeper overlap in distributed ML, characterizes inefficiency trade-offs, and develops heuristics for schedule selection to improve performance.
Findings
Achieves up to 1.6x speedup in ML training scenarios.
Heuristics guide schedule selection with 81% accuracy.
Demonstrates wider design space and efficiency gains over shard-level overlap.
Abstract
As both ML training and inference are increasingly distributed, parallelization techniques that shard (divide) ML model across GPUs of a distributed system, are often deployed. With such techniques, there is a high prevalence of data-dependent communication and computation operations where communication is exposed, leaving as high as 1.7x ideal performance on the table. Prior works harness the fact that ML model state and inputs are already sharded, and employ careful overlap of individual computation/communication shards. While such coarse-grain overlap is promising, in this work, we instead make a case for finer-grain compute-communication overlap which we term FiCCO, where we argue for finer-granularity, one-level deeper overlap than at shard-level, to unlock compute/communication overlap for a wider set of network topologies, finer-grain dataflow and more. We show that FiCCO opens…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Cloud Computing and Resource Management · Advanced Neural Network Applications
