Retrofitting Large Language Models with Dynamic Tokenization

Darius Feher; Ivan Vuli\'c; Benjamin Minixhofer

arXiv:2411.18553·cs.CL·June 12, 2025

Retrofitting Large Language Models with Dynamic Tokenization

Darius Feher, Ivan Vuli\'c, Benjamin Minixhofer

PDF

Open Access 1 Video

TL;DR

This paper introduces dynamic tokenization for large language models, allowing on-the-fly token boundary decisions to improve efficiency and fairness across languages with minimal performance loss.

Contribution

It proposes a novel dynamic tokenization method that adapts token boundaries during inference, reducing sequence length and improving multilingual fairness in LMs.

Findings

01

Reduces token sequence length by >20% in encoder models across 14 languages

02

Achieves up to 17% sequence length reduction in decoder models with minimal performance loss

03

Enhances inference speed and language fairness in large language models

Abstract

Current language models (LMs) use a fixed, static subword tokenizer. This default choice typically results in degraded efficiency and language capabilities, especially in languages other than English. To address this issue, we challenge the static design and propose retrofitting LMs with dynamic tokenization: a way to dynamically decide on token boundaries based on the input text via a subword-merging algorithm inspired by byte-pair encoding. We merge frequent subword sequences in a batch, then apply a pre-trained embedding-prediction hypernetwork to compute the token embeddings on-the-fly. For encoder-style models (e.g., XLM-R), this on average reduces token sequence lengths by >20% across 14 languages while degrading performance by less than 2%. The same method applied to pre-filling and scoring in decoder-style models (e.g., Mistral-7B) results in minimal performance degradation at…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Retrofitting Large Language Models with Dynamic Tokenization· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Machine Learning and Algorithms

MethodsHyperNetwork · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · XLM-R