Adacc: An Adaptive Framework Unifying Compression and Activation Recomputation for LLM Training
Ping Chen, Zhuohong Deng, Ping Li, Shuibing He, Hongzi Zhu, Yi Zheng, Zhefeng Wang, Baoxing Huai, Minyi Guo

TL;DR
Adacc is an adaptive framework that unifies activation recomputation and data compression, dynamically optimizing memory usage during LLM training to enhance efficiency without sacrificing accuracy.
Contribution
It introduces a fine-grained, tensor-level adaptive strategy that combines multiple memory optimization techniques with global scheduling and dynamic policy updates.
Findings
Improves training throughput by up to 1.37x
Maintains model accuracy comparable to baseline
Effectively balances memory savings and computational overhead
Abstract
Training large language models (LLMs) is often constrained by GPU memory limitations. To alleviate memory pressure, activation recomputation and data compression have been proposed as two major strategies. However, both approaches have limitations: recomputation introduces significant training overhead, while compression can lead to accuracy degradation and computational inefficiency when applied naively. In this paper, we propose Adacc, the first adaptive memory optimization framework that unifies activation recomputation and data compression to improve training efficiency for LLMs while preserving model accuracy. Unlike existing methods that apply static, rule-based strategies or rely solely on one technique, Adacc makes fine-grained, tensor-level decisions, dynamically selecting between recomputation, retention, and compression based on tensor characteristics and runtime hardware…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Intelligent Tutoring Systems and Adaptive Learning
