VocabTailor: Dynamic Vocabulary Selection for Downstream Tasks in Small Language Models

Hanling Zhang; Yayu Zhou; Tongcheng Fang; Zhihang Yuan; Guohao Dai; Wanli Ouyang; Yu Wang

arXiv:2508.15229·cs.CL·April 21, 2026

VocabTailor: Dynamic Vocabulary Selection for Downstream Tasks in Small Language Models

Hanling Zhang, Yayu Zhou, Tongcheng Fang, Zhihang Yuan, Guohao Dai, Wanli Ouyang, Yu Wang

PDF

1 Repo

TL;DR

VocabTailor is a dynamic vocabulary selection framework for small language models that significantly reduces memory usage while maintaining task performance, enabling more efficient deployment on resource-constrained devices.

Contribution

It introduces a decoupled, hybrid static-dynamic vocabulary selection method that adapts vocabulary components on demand, outperforming static pruning techniques.

Findings

01

Achieves up to 99% reduction in vocabulary-related memory usage.

02

Maintains minimal or no performance degradation across diverse tasks.

03

Outperforms existing static vocabulary pruning methods.

Abstract

Small Language Models (SLMs) provide computational advantages in resource-constrained environments, yet memory limitations remain a critical bottleneck for edge device deployment. A substantial portion of SLMs' memory footprint stems from vocabulary-related components, particularly embeddings and language modeling (LM) heads, due to large vocabulary sizes. Existing static vocabulary pruning, while reducing memory usage, suffers from rigid, one-size-fits-all designs that cause information loss during the prefill stage and lack flexibility. In this work, we identify two key principles underlying the vocabulary reduction challenge: the lexical locality principle, the observation that only a small subset of tokens is required during any single inference, and the asymmetry in computational characteristics between vocabulary-related components of SLM. Based on these insights, we introduce…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

AwakenedInsects/VocabTailor
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.