CXL-GPU: Pushing GPU Memory Boundaries with the Integration of CXL Technologies

Donghyun Gouk; Seungkwan Kang; Seungjun Lee; Jiseon Kim; Kyungkuk Nam; Eojin Ryu; Sangwon Lee; Dongpyung Kim; Junhyeok Jang; Hanyeoreum Bae; and Myoungsoo Jung

arXiv:2506.15601·cs.AR·June 19, 2025

CXL-GPU: Pushing GPU Memory Boundaries with the Integration of CXL Technologies

Donghyun Gouk, Seungkwan Kang, Seungjun Lee, Jiseon Kim, Kyungkuk Nam, Eojin Ryu, Sangwon Lee, Dongpyung Kim, Junhyeok Jang, Hanyeoreum Bae, and Myoungsoo Jung

PDF

Open Access

TL;DR

This paper presents a novel GPU storage expansion solution using CXL technology, achieving low latency and high performance, enabling GPUs to effectively utilize diverse storage media.

Contribution

It introduces a new GPU system design with multiple CXL root ports and a custom CXL controller, the first to achieve nanosecond roundtrip latency in the field.

Findings

01

Significantly outperforms existing methods in performance

02

Achieves two-digit nanosecond roundtrip latency

03

Efficiently manages read/write operations with speculative mechanisms

Abstract

This work introduces a GPU storage expansion solution utilizing CXL, featuring a novel GPU system design with multiple CXL root ports for integrating diverse storage media (DRAMs and/or SSDs). We developed and siliconized a custom CXL controller integrated at the hardware RTL level, achieving two-digit nanosecond roundtrip latency, the first in the field. This study also includes speculative read and deterministic store mechanisms to efficiently manage read and write operations to hide the endpoint's backend media latency variation. Performance evaluations reveal our approach significantly outperforms existing methods, marking a substantial advancement in GPU storage technology.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques