Zero-Shot Tokenizer Transfer

Benjamin Minixhofer; Edoardo Maria Ponti; Ivan Vuli\'c

arXiv:2405.07883·cs.CL·October 29, 2025

Zero-Shot Tokenizer Transfer

Benjamin Minixhofer, Edoardo Maria Ponti, Ivan Vuli\'c

PDF

Open Access 2 Repos 1 Models 1 Video

TL;DR

This paper introduces Zero-Shot Tokenizer Transfer (ZeTT), a method to swap tokenizers in language models without performance loss by training a hypernetwork to generate token embeddings, enabling greater flexibility across languages and tasks.

Contribution

We propose a hypernetwork-based approach for zero-shot tokenizer transfer that generalizes to new tokenizers and reduces sequence length, improving model flexibility and efficiency.

Findings

01

Hypernetwork predicts embeddings for new tokenizers effectively.

02

Performance close to original models in multilingual and coding tasks.

03

Remaining gaps can be closed with less than 1B tokens of additional training.

Abstract

Language models (LMs) are bound to their tokenizer, which maps raw text to a sequence of vocabulary items (tokens). This restricts their flexibility: for example, LMs trained primarily on English may still perform well in other natural and programming languages, but have vastly decreased efficiency due to their English-centric tokenizer. To mitigate this, we should be able to swap the original LM tokenizer with an arbitrary one, on the fly, without degrading performance. Hence, in this work we define a new problem: Zero-Shot Tokenizer Transfer (ZeTT). The challenge at the core of ZeTT is finding embeddings for the tokens in the vocabulary of the new tokenizer. Since prior heuristics for initializing embeddings often perform at chance level in a ZeTT setting, we propose a new solution: we train a hypernetwork taking a tokenizer as input and predicting the corresponding embeddings. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
benjamin/zett-hypernetwork-multilingual-Mistral-7B-v0.1
model· 7 dl· ♡ 2
7 dl♡ 2

Videos

Zero-Shot Tokenizer Transfer· slideslive

Taxonomy

TopicsOrganoboron and organosilicon chemistry

MethodsBalanced Selection · HyperNetwork