Vocabulary-level Memory Efficiency for Language Model Fine-tuning

Miles Williams; Nikolaos Aletras

arXiv:2309.08708·cs.CL·March 26, 2025

Vocabulary-level Memory Efficiency for Language Model Fine-tuning

Miles Williams, Nikolaos Aletras

PDF

Open Access 2 Repos 1 Video

TL;DR

This paper introduces a memory-efficient fine-tuning method for language models by reducing the embedding matrix size, leveraging unused vocabulary during training, which significantly cuts memory use without affecting performance.

Contribution

The paper proposes a novel approach to minimize embedding matrix memory footprint by exploiting unused vocabulary, enhancing fine-tuning efficiency without performance loss.

Findings

01

Substantial memory reduction across models and tasks

02

No impact on downstream task performance

03

More efficient computational resource utilization

Abstract

The extensive memory footprint of language model (LM) fine-tuning poses a challenge for both researchers and practitioners. LMs use an embedding matrix to represent extensive vocabularies, forming a substantial proportion of the model parameters. While previous work towards memory-efficient fine-tuning has focused on minimizing the number of trainable parameters, reducing the memory footprint of the embedding matrix has yet to be explored. We first demonstrate that a significant proportion of the vocabulary remains unused during fine-tuning. We then propose a simple yet effective approach that leverages this finding to minimize memory usage. We show that our approach provides substantial reductions in memory usage across a wide range of models and tasks. Notably, our approach does not impact downstream task performance, while allowing more efficient use of computational resources.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Vocabulary-level Memory Efficiency for Language Model Fine-tuning· underline

Taxonomy

TopicsTopic Modeling · Advanced Neural Network Applications · Ferroelectric and Negative Capacitance Devices

MethodsPruning