Compute Can't Handle the Truth: Why Communication Tax Prioritizes Memory and Interconnects in Modern AI Infrastructure
Myoungsoo Jung

TL;DR
This paper analyzes the scalability challenges of modern AI hardware and proposes a modular, disaggregated data center architecture utilizing CXL and optimized interconnects to improve scalability and efficiency.
Contribution
It introduces a novel modular data center architecture with CXL and hybrid interconnects, addressing scalability bottlenecks in AI hardware infrastructure.
Findings
Enhanced scalability and throughput demonstrated in evaluations.
Hybrid CXL-over-XLink design reduces data transfer overhead.
Hierarchical memory models improve resource flexibility.
Abstract
Modern AI workloads such as large language models (LLMs) and retrieval-augmented generation (RAG) impose severe demands on memory, communication bandwidth, and resource flexibility. Traditional GPU-centric architectures struggle to scale due to growing inter-GPU communication overheads. This report introduces key AI concepts and explains how Transformers revolutionized data representation in LLMs. We analyze large-scale AI hardware and data center designs, identifying scalability bottlenecks in hierarchical systems. To address these, we propose a modular data center architecture based on Compute Express Link (CXL) that enables disaggregated scaling of memory, compute, and accelerators. We further explore accelerator-optimized interconnects-collectively termed XLink (e.g., UALink, NVLink, NVLink Fusion)-and introduce a hybrid CXL-over-XLink design to reduce long-distance data transfers…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputability, Logic, AI Algorithms
