zip2zip: Inference-Time Adaptive Tokenization via Online Compression
Saibo Geng, Nathan Ranchin, Yunzhen yao, Maxime Peyrard, Chris Wendler, Michael Gastpar, Robert West

TL;DR
zip2zip introduces a method for inference-time adaptive tokenization using online compression, enabling large language models to dynamically optimize their token vocabularies for specific contexts, reducing token counts and improving efficiency.
Contribution
The paper presents zip2zip, a novel approach that dynamically adapts tokenization at inference time using online compression, which is a significant departure from static tokenizers.
Findings
Reduces input and output tokens by 15-40%.
Enables LLMs to adapt tokenization to specific contexts.
Achieves this with 10 GPU-hours of finetuning.
Abstract
Tokenization efficiency plays a critical role in the performance and cost of large language models (LLMs), yet most models rely on static tokenizers optimized on general-purpose corpora. These tokenizers' fixed vocabularies often fail to adapt to domain- or language-specific inputs, leading to longer token sequences and higher computational costs. We introduce zip2zip, a novel method for achieving context-adaptive tokenization in LLMs at inference time. Leveraging an online data compression algorithm (Lempel-Ziv-Welch), zip2zip dynamically expands its active vocabulary at inference time by continuously replacing fragmented token sequences with more compact hypertokens, which it can immediately output during generation. In doing so, the model refines its internal tokenization scheme to match the token distribution of the current context, reducing redundancy and improving representational…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗nathanrchn/zip2zip-testmodel
- 🤗Saibo-creator/zip2zip-evqn-7000model
- 🤗Saibo-creator/zip2zip-evqn-7000-newmodel
- 🤗Saibo-creator/zip2zip-Phi-3.5-mini-instruct-v0.1model
- 🤗Saibo-creator/zip2zip-Llama-3.2-3B-Instruct-v0.1model
- 🤗Saibo-creator/zip2zip-Llama-3.2-1B-Instruct-v0.1model
- 🤗Saibo-creator/zip2zip-Llama-3.1-8B-Instruct-v0.1model
- 🤗epfl-dlab/zip2zip-Llama-3.1-8B-Instruct-v0.1model
- 🤗epfl-dlab/zip2zip-Llama-3.2-1B-Instruct-v0.1model
- 🤗epfl-dlab/zip2zip-Llama-3.2-3B-Instruct-v0.1model
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
