Analysis and Optimized CXL-Attached Memory Allocation for Long-Context LLM Fine-Tuning

Yong-Cheng Liaw; Shuo-Han Chen

arXiv:2507.03305·cs.DC·October 31, 2025

Analysis and Optimized CXL-Attached Memory Allocation for Long-Context LLM Fine-Tuning

Yong-Cheng Liaw, Shuo-Han Chen

PDF

Open Access

TL;DR

This paper proposes a CXL-aware memory management approach for long-context LLM fine-tuning, significantly improving throughput by intelligently allocating memory across local DRAM and CXL devices.

Contribution

It introduces a PyTorch extension and memory allocator that enable fine-grained tensor control and optimized placement across CXL and DRAM, addressing current framework limitations.

Findings

01

Achieves 97-99% of DRAM-only throughput with a single AIC.

02

Provides up to 21% speedup over naive memory placement.

03

Enables scaling of long-context fine-tuning beyond DRAM capacity.

Abstract

The substantial memory requirements of Large Language Models (LLMs), particularly for long-context fine-tuning, have renewed interest in CPU offloading to augment limited GPU memory. However, as context lengths grow, relying on CPU memory for intermediate states introduces a significant bottleneck that can exhaust the capacity of mainstream client platforms. To address this limitation, this work investigates the effectiveness of Compute Express Link (CXL) add-in card (AIC) memory as an extension to CPU memory, enabling larger model sizes and longer context lengths during fine-tuning. Extensive benchmarking reveals two critical challenges. First, current deep learning frameworks such as PyTorch lack fine-grained, per-tensor control over NUMA memory allocation, exposing only coarse, process-level policies. Second, due to this lack of control, when the memory footprint of fine-tuning is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvancements in Photolithography Techniques · Medical Imaging Techniques and Applications