ProTrain: Efficient LLM Training via Memory-Aware Techniques

Hanmei Yang; Jin Zhou; Yao Fu; Xiaoqun Wang; Ramine Roane; Hui Guan; Tongping Liu

arXiv:2406.08334·cs.DC·April 21, 2026

ProTrain: Efficient LLM Training via Memory-Aware Techniques

Hanmei Yang, Jin Zhou, Yao Fu, Xiaoqun Wang, Ramine Roane, Hui Guan, Tongping Liu

PDF

TL;DR

ProTrain is a system that automatically optimizes memory management for large language model training, improving throughput without manual tuning or accuracy loss.

Contribution

It introduces automated memory management policies tailored to hardware and model architecture, reducing engineering overhead and enhancing training efficiency.

Findings

01

ProTrain achieves 1.43× to 2.71× faster training throughput.

02

It uses cost models and runtime profiling for optimal memory policy search.

03

ProTrain maintains training accuracy while improving efficiency.

Abstract

Memory pressure has emerged as a dominant constraint in scaling the training of large language models (LLMs), particularly in resource-constrained environments. While modern frameworks incorporate various memory-saving techniques, they often expose low-level configuration knobs that require manual tuning and specialized system expertise. This not only adds engineering overhead but also risks suboptimal hardware utilization when misconfigured. This paper introduces ProTrain, a novel training system that automatically tailors memory management policies to the model architecture and underlying hardware resources, eliminating the need for manual intervention. The core of ProTrain is its automated memory management that abstracts complex memory management strategies into a few tunable configuration parameters, allowing searches for optimal parameter settings using cost models. ProTrain is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.