Octopus: Enhancing CXL Memory Pods via Sparse Topology
Yuhong Zhong, Fiodar Kazhamiaka, Pantea Zardoshti, Shuwei Teng, Rodrigo Fonseca, Mark D. Hill, Daniel S. Berger

TL;DR
Octopus introduces a scalable, switchless CXL memory pod topology that balances pooling efficiency and low-latency communication, achieving hardware speedups and cost savings.
Contribution
It proposes a novel sparse CXL topology with server grouping to improve scalability and performance without switches.
Findings
Hardware RPCs are 3.2x faster than in-rack RDMA.
Simulation shows 3-5.4% server cost savings with Octopus.
Octopus outperforms switch-based designs in speed and cost.
Abstract
The Compute Express Link (CXL) interconnect enables compute "pods" that pool memory across servers to reduce cost and improve efficiency. These pods also facilitate pairwise communication whose needs conflict with pooling. Importantly, existing pod designs are small or require indirection through expensive switches. These conventional designs implicitly assume that pods must fully connect all servers to all CXL pooling devices. This paper breaks with this conventional wisdom by introducing Octopus pods. Octopus directly connects servers to low-port-count CXL pooling devices (e.g., 4 ports) yet scales to large pods without switches by constructing a sparse CXL topology in which each pooling device connects to a carefully chosen subset of servers. Octopus explicitly balances "overlap", where two servers connect to the same pooling device: overlap reduces pooling efficiency but enables…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
