Efficient All-to-All Collective Communication Schedules for   Direct-Connect Topologies

Prithwish Basu; Liangyu Zhao; Jason Fantl; Siddharth Pal and; Arvind Krishnamurthy; Joud Khoury

arXiv:2309.13541·cs.DC·April 29, 2024

Efficient All-to-All Collective Communication Schedules for Direct-Connect Topologies

Prithwish Basu, Liangyu Zhao, Jason Fantl, Siddharth Pal and, Arvind Krishnamurthy, Joud Khoury

PDF

Open Access

TL;DR

This paper develops optimized all-to-all communication schedules for direct-connect supercomputer topologies, addressing algorithmic challenges and proposing a new topology for near-optimal performance in ML and HPC workloads.

Contribution

It introduces a holistic approach to optimize all-to-all communication schedules across various topologies and proposes a novel topology achieving near-optimal performance.

Findings

01

Developed bandwidth-efficient all-to-all schedules for diverse topologies.

02

Lowered schedules to multiple runtimes and interconnect technologies.

03

Proposed a new topology with near-optimal all-to-all performance.

Abstract

The all-to-all collective communications primitive is widely used in machine learning (ML) and high performance computing (HPC) workloads, and optimizing its performance is of interest to both ML and HPC communities. All-to-all is a particularly challenging workload that can severely strain the underlying interconnect bandwidth at scale. This paper takes a holistic approach to optimize the performance of all-to-all collective communications on supercomputer-scale direct-connect interconnects. We address several algorithmic and practical challenges in developing efficient and bandwidth-optimal all-to-all schedules for any topology and lowering the schedules to various runtimes and interconnect technologies. We also propose a novel topology that delivers near-optimal all-to-all performance.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInterconnection Networks and Systems · Advanced Memory and Neural Computing · Ferroelectric and Negative Capacitance Devices