Fishing for Magikarp: Automatically Detecting Under-trained Tokens in Large Language Models
Sander Land, Max Bartolo

TL;DR
This paper introduces methods to automatically identify under-trained tokens, like glitch tokens, in large language models to improve their safety and reliability.
Contribution
It presents novel techniques combining tokenizer analysis, model weights, and prompting to detect problematic tokens in LLMs.
Findings
Under-trained tokens are prevalent across various models.
The proposed methods effectively identify glitch tokens.
Detection improves model safety and robustness.
Abstract
The disconnect between tokenizer creation and model training in language models allows for specific inputs, such as the infamous SolidGoldMagikarp token, to induce unwanted model behaviour. Although such `glitch tokens', tokens present in the tokenizer vocabulary but that are nearly or entirely absent during model training, have been observed across various models, a reliable method to identify and address them has been missing. We present a comprehensive analysis of Large Language Model tokenizers, specifically targeting this issue of detecting under-trained tokens. Through a combination of tokenizer analysis, model weight-based indicators, and prompting techniques, we develop novel and effective methods for automatically detecting these problematic tokens. Our findings demonstrate the prevalence of such tokens across a diverse set of models and provide insights into improving the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
MethodsSparse Evolutionary Training
