CXL-GPU: Pushing GPU Memory Boundaries with the Integration of CXL Technologies
Donghyun Gouk, Seungkwan Kang, Seungjun Lee, Jiseon Kim, Kyungkuk Nam, Eojin Ryu, Sangwon Lee, Dongpyung Kim, Junhyeok Jang, Hanyeoreum Bae, and Myoungsoo Jung

TL;DR
This paper presents a novel GPU storage expansion solution using CXL technology, achieving low latency and high performance, enabling GPUs to effectively utilize diverse storage media.
Contribution
It introduces a new GPU system design with multiple CXL root ports and a custom CXL controller, the first to achieve nanosecond roundtrip latency in the field.
Findings
Significantly outperforms existing methods in performance
Achieves two-digit nanosecond roundtrip latency
Efficiently manages read/write operations with speculative mechanisms
Abstract
This work introduces a GPU storage expansion solution utilizing CXL, featuring a novel GPU system design with multiple CXL root ports for integrating diverse storage media (DRAMs and/or SSDs). We developed and siliconized a custom CXL controller integrated at the hardware RTL level, achieving two-digit nanosecond roundtrip latency, the first in the field. This study also includes speculative read and deterministic store mechanisms to efficiently manage read and write operations to hide the endpoint's backend media latency variation. Performance evaluations reveal our approach significantly outperforms existing methods, marking a substantial advancement in GPU storage technology.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques
