The Hitchhiker's Guide to Programming and Optimizing Cache Coherent Heterogeneous Systems: CXL, NVLink-C2C, and AMD Infinity Fabric
Zixuan Wang, Suyash Mahar, Luyi Li, Jangseon Park, Jinpyo Kim, Theodore Michailidis, Yue Pan, Mingyao Shen, Tajana Rosing, Dean Tullsen, Steven Swanson, Jishen Zhao

TL;DR
This paper analyzes modern cache-coherent heterogeneous systems interconnected by CXL, NVLink-C2C, and Infinity Fabric, introducing a benchmark suite to evaluate performance and guide future system optimization.
Contribution
It develops Heimdall, a benchmark suite for profiling heterogeneous systems, and provides detailed analysis and insights into optimizing performance across various cache-coherent interconnects.
Findings
Heimdall effectively profiles heterogeneous system performance.
Identified key architectural factors influencing performance.
Provided recommendations for optimizing cache-coherent systems.
Abstract
We present a thorough analysis of the use of modern heterogeneous systems interconnected by various cachecoherent links, including CXL, NVLink-C2C, and Infinity Fabric. We studied a wide range of server systems that combined CPUs from different vendors and various types of coherent memory devices, including CXL memory expander, CXL pool, CXL shared memory, GH200 GPU, and AMD MI300a HBM. For this study, we developed a heterogeneous memory benchmark suite, Heimdall, to profile the performance of such heterogeneous systems and present a detailed performance comparison across systems. By leveraging H E I M DA L L , we unveiled the detailed architecture design in these systems, drew observations on optimizing performance for workloads, and pointed out directions for future development of cache coherent heterogeneous systems.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Distributed and Parallel Computing Systems
