HiCCL: A Hierarchical Collective Communication Library
Mert Hidayetoglu, Simon Garcia de Gonzalo, Elliott Slaughter, Pinku, Surana, Wen-mei Hwu, William Gropp, Alex Aiken

TL;DR
HiCCL is a flexible, hierarchical collective communication library that significantly improves throughput and portability across diverse GPU systems by decoupling communication logic from hardware-specific optimizations.
Contribution
It introduces a compositional API for collective communication that adapts to various hardware hierarchies, enabling high performance and portability.
Findings
Achieves 17x higher throughput than specialized MPI implementations
Provides competitive performance with vendor-specific libraries
Demonstrates portability across Nvidia, AMD, and Intel GPU systems
Abstract
HiCCL (Hierarchical Collective Communication Library) addresses the growing complexity and diversity in high-performance network architectures. As GPU systems have envolved into networks of GPUs with different multilevel communication hierarchies, optimizing each collective function for a specific system has become a challenging task. Consequently, many collective libraries struggle to adapt to different hardware and software, especially across systems from different vendors. HiCCL's library design decouples the collective communication logic from network-specific optimizations through a compositional API. The communication logic is composed using multicast, reduction, and fence primitives, which are then factorized for a specified network hieararchy using only point-to-point operations within a level. Finally, striping and pipelining optimizations applied as specified for streamlining…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimedia Communication and Technology
