Multi-Plane HyperX: A Low-Latency and Cost-Effective Network for Large-Scale AI and HPC Systems
Ziyu Wang, Fei Lei, Dezun Dong

TL;DR
This paper explores multi-plane HyperX networks, demonstrating they offer lower latency and cost benefits over existing topologies like Fat-Tree and Dragonfly in large-scale AI and HPC systems.
Contribution
It introduces the application of multi-plane technology to HyperX networks, showing improved diameter and cost-effectiveness over prior architectures.
Findings
Multi-plane HyperX reduces network diameter significantly.
It outperforms multi-plane Fat-Tree, Dragonfly, and Dragonfly+ in cost-effectiveness.
The architecture is suitable for large-scale AI and HPC data centers.
Abstract
Multi-plane architectures have become increasingly prevalent in the Fat-Tree networks of AI data centers. By leveraging multiple ports on a single network interface card (NIC) or multiple NICs within a scale-up domain, each port or NIC is allocated to an independent network plane, thereby provisioning the overall system with multiple network planes. However, no prior literature has explored the application of multi-plane technologies to direct networks such as HyperX. This paper investigates the multi-plane HyperX network and demonstrates that, compared to state-of-the-art network topologies like multi-plane Fat-Tree, Dragonfly, and Dragonfly+, the multi-plane HyperX architecture achieves a significantly smaller network diameter and superior cost-effectiveness.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
