MEFT: Memory-Efficient Fine-Tuning through Sparse Adapter
Jitai Hao, WeiWei Sun, Xin Xin, Qi Meng, Zhumin Chen, Pengjie Ren,, Zhaochun Ren

TL;DR
MEFT introduces a memory-efficient fine-tuning method for large language models that leverages activation sparsity and CPU memory to enable larger adapters without requiring extensive GPU resources.
Contribution
The paper proposes a novel approach to fine-tune LLMs with larger adapters by exploiting activation sparsity and CPU memory, improving performance under limited GPU resources.
Findings
Achieves comparable fine-tuning results with limited GPU memory
Utilizes CPU memory and activation sparsity for efficient adapter training
Reduces communication overhead with a Mixture of Experts architecture
Abstract
Parameter-Efficient Fine-tuning (PEFT) facilitates the fine-tuning of Large Language Models (LLMs) under limited resources. However, the fine-tuning performance with PEFT on complex, knowledge-intensive tasks is limited due to the constrained model capacity, which originates from the limited number of additional trainable parameters. To overcome this limitation, we introduce a novel mechanism that fine-tunes LLMs with adapters of larger size yet memory-efficient. This is achieved by leveraging the inherent activation sparsity in the Feed-Forward Networks (FFNs) of LLMs and utilizing the larger capacity of Central Processing Unit (CPU) memory compared to Graphics Processing Unit (GPU). We store and update the parameters of larger adapters on the CPU. Moreover, we employ a Mixture of Experts (MoE)-like architecture to mitigate unnecessary CPU computations and reduce the communication…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNeural Networks and Reservoir Computing · Photonic and Optical Devices · Advanced Memory and Neural Computing
