Training-Free Tokenizer Transplantation via Orthogonal Matching Pursuit
Charles Goddard, Fernando Fernandes Neto

TL;DR
This paper introduces a training-free method using Orthogonal Matching Pursuit to transplant tokenizers in pretrained large language models, enabling effective cross-tokenizer adaptation without retraining.
Contribution
It proposes a novel, training-free approach for tokenizer transplantation in LLMs using sparse linear combinations, improving performance across different tokenization schemes.
Findings
OMP outperforms existing zero-shot methods in preserving model performance.
The method effectively bridges large tokenizer discrepancies without gradient updates.
OMP enables practical applications like cross-tokenizer knowledge transfer and vocabulary adaptation.
Abstract
We present a training-free method to transplant tokenizers in pretrained large language models (LLMs) by reconstructing unseen token embeddings via Orthogonal Matching Pursuit (OMP). Specifically, we approximate each out-of-vocabulary token as a sparse linear combination of shared tokens, in two phases: first, compute each new token's representation in the donor embedding space with a small dictionary of shared anchor tokens, then transfer these same sparse coefficients back into the base model's embedding space. On two challenging cross-tokenizer tasks--LlamaMistral NeMo (12B) and QwenLlama (1B)--we show that OMP achieves best zero-shot preservation of the base model's performance across multiple benchmarks, while other zero-shot approaches degrade significantly. Compared to baselines (zero-init, mean-init, and existing approaches like WECHSEL, FOCUS, ZETT), OMP…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Graph Neural Networks
