Memory-Efficient Fine-Tuning of Transformers via Token Selection
Antoine Simoulin, Namyong Park, Xiaoyi Liu, Grey Yang

TL;DR
TokenTune is a novel method that reduces memory usage during transformer fine-tuning by approximating gradients through a subset of tokens, enabling efficient training of large models without significant performance loss.
Contribution
It introduces TokenTune, a memory-efficient fine-tuning technique that approximates gradients by backpropagating through selected tokens, compatible with existing methods like LoRA.
Findings
Achieves comparable performance to full fine-tuning on multiple tasks.
Significantly reduces memory footprint during training.
Easily combined with other memory-efficient methods.
Abstract
Fine-tuning provides an effective means to specialize pre-trained models for various downstream tasks. However, fine-tuning often incurs high memory overhead, especially for large transformer-based models, such as LLMs. While existing methods may reduce certain parts of the memory required for fine-tuning, they still require caching all intermediate activations computed in the forward pass to update weights during the backward pass. In this work, we develop TokenTune, a method to reduce memory usage, specifically the memory to store intermediate activations, in the fine-tuning of transformer-based models. During the backward pass, TokenTune approximates the gradient computation by backpropagating through just a subset of input tokens. Thus, with TokenTune, only a subset of intermediate activations are cached during the forward pass. Also, TokenTune can be easily combined with existing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Memory and Neural Computing · Ferroelectric and Negative Capacitance Devices · Neural Networks and Applications
