Fishing for Magikarp: Automatically Detecting Under-trained Tokens in   Large Language Models

Sander Land; Max Bartolo

arXiv:2405.05417·cs.CL·September 30, 2024

Fishing for Magikarp: Automatically Detecting Under-trained Tokens in Large Language Models

Sander Land, Max Bartolo

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces methods to automatically identify under-trained tokens, like glitch tokens, in large language models to improve their safety and reliability.

Contribution

It presents novel techniques combining tokenizer analysis, model weights, and prompting to detect problematic tokens in LLMs.

Findings

01

Under-trained tokens are prevalent across various models.

02

The proposed methods effectively identify glitch tokens.

03

Detection improves model safety and robustness.

Abstract

The disconnect between tokenizer creation and model training in language models allows for specific inputs, such as the infamous SolidGoldMagikarp token, to induce unwanted model behaviour. Although such `glitch tokens', tokens present in the tokenizer vocabulary but that are nearly or entirely absent during model training, have been observed across various models, a reliable method to identify and address them has been missing. We present a comprehensive analysis of Large Language Model tokenizers, specifically targeting this issue of detecting under-trained tokens. Through a combination of tokenizer analysis, model weight-based indicators, and prompting techniques, we develop novel and effective methods for automatically detecting these problematic tokens. Our findings demonstrate the prevalence of such tokens across a diverse set of models and provide insights into improving the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cohere-ai/magikarp
noneOfficial

Videos

Fishing for Magikarp: Automatically Detecting Under-trained Tokens in Large Language Models· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsSparse Evolutionary Training