Dynamic Vocabulary Pruning in Early-Exit LLMs

Jort Vincenti; Karim Abdel Sadek; Joan Velja; Matteo Nulli; Metod; Jazbec

arXiv:2410.18952·cs.CL·October 31, 2024

Dynamic Vocabulary Pruning in Early-Exit LLMs

Jort Vincenti, Karim Abdel Sadek, Joan Velja, Matteo Nulli, Metod, Jazbec

PDF

Open Access 1 Repo

TL;DR

This paper introduces a method for dynamically pruning the vocabulary during inference in early-exit large language models, significantly improving efficiency without sacrificing performance.

Contribution

It proposes a novel post-hoc dynamic vocabulary pruning technique at early layers to enhance confidence estimation efficiency in early-exit LLMs.

Findings

01

Improved inference efficiency in early-exit LLMs

02

Maintained competitive performance with vocabulary pruning

03

Reduced computational cost of confidence estimation

Abstract

Increasing the size of large language models (LLMs) has been shown to lead to better performance. However, this comes at the cost of slower and more expensive inference. Early-exiting is a promising approach for improving the efficiency of LLM inference by enabling next token prediction at intermediate layers. Yet, the large vocabulary size in modern LLMs makes the confidence estimation required for exit decisions computationally expensive, diminishing the efficiency gains. To address this, we propose dynamically pruning the vocabulary at test time for each token. Specifically, the vocabulary is pruned at one of the initial layers, and the smaller vocabulary is then used throughout the rest of the forward pass. Our experiments demonstrate that such post-hoc dynamic vocabulary pruning improves the efficiency of confidence estimation in early-exit LLMs while maintaining competitive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

matteonulli/vocabulary_pruning
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Second Language Acquisition and Learning · Text Readability and Simplification

MethodsPruning