Composing Distributed Computations Through Task and Kernel Fusion

Rohan Yadav; Shiv Sundram; Wonchan Lee; Michael Garland; Michael; Bauer; Alex Aiken; Fredrik Kjolstad

arXiv:2406.18109·cs.DC·December 17, 2024

Composing Distributed Computations Through Task and Kernel Fusion

Rohan Yadav, Shiv Sundram, Wonchan Lee, Michael Garland, Michael, Bauer, Alex Aiken, Fredrik Kjolstad

PDF

Open Access

TL;DR

Diffuse is a system that enables dynamic task and kernel fusion in distributed runtime environments, significantly improving performance and scalability of task-based applications across multiple GPUs.

Contribution

Diffuse introduces a scalable intermediate representation for distributed computation that facilitates task and kernel fusion, enhancing performance across real-world libraries.

Findings

01

Achieves 1.86x average speedup on unmodified applications

02

Enables performance scaling up to 128 GPUs

03

Identifies optimization opportunities missed by developers

Abstract

We introduce Diffuse, a system that dynamically performs task and kernel fusion in distributed, task-based runtime systems. The key component of Diffuse is an intermediate representation of distributed computation that enables the necessary analyses for the fusion of distributed tasks to be performed in a scalable manner. We pair task fusion with a JIT compiler to fuse together the kernels within fused tasks. We show empirically that Diffuse's intermediate representation is general enough to be a target for two real-world, task-based libraries (cuNumeric and Legate Sparse), letting Diffuse find optimization opportunities across function and library boundaries. Diffuse accelerates unmodified applications developed by composing task-based libraries by 1.86x on average (geo-mean), and by between 0.93x--10.7x on up to 128 GPUs. Diffuse also finds optimization opportunities missed by the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications