Iris: First-Class Multi-GPU Programming Experience in Triton
Muhammad Awad, Muhammad Osama, Brandon Potter

TL;DR
Iris is a Python and Triton-based multi-GPU communication library that simplifies programming while achieving high performance, enabling seamless computation-communication overlap and outperforming existing libraries in key workloads.
Contribution
Iris introduces a high-level, Python-based multi-GPU communication library in Triton that simplifies development and achieves near-optimal performance.
Findings
Achieves up to 1.79x speedup over PyTorch and RCCL.
Provides a taxonomy of compute-communication overlap patterns.
Matches or exceeds performance of heavily-optimized libraries.
Abstract
Multi-GPU programming traditionally requires developers to navigate complex trade-offs between performance and programmability. High-performance implementations typically rely on low-level HIP/CUDA communication libraries that demand substantial engineering effort for even basic overlap patterns, while simpler abstractions often sacrifice performance. We present Iris, a multi-GPU communication library implemented entirely in Python and Triton that eliminates this trade-off. Iris provides tile-based symmetric memory abstractions that naturally align with Triton's programming model, enabling developers to write single-source kernels that seamlessly interleave computation and communication. We demonstrate a taxonomy of compute-communication overlap patterns--from bulk-synchronous to fine-grained workgroup specialization--that can be implemented with minimal code changes in Iris, often…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Embedded Systems Design Techniques · Network Packet Processing and Optimization
