Improbable Bigrams Expose Vulnerabilities of Incomplete Tokens in Byte-Level Tokenizers
Eugene Jang, Kimin Lee, Jin-Woo Chung, Keuntae Park, Seungwon Shin

TL;DR
This paper uncovers vulnerabilities in byte-level BPE tokenizers by demonstrating that improbable bigrams can induce hallucinations in language models, highlighting the fragility of incomplete tokens and the importance of robust tokenization.
Contribution
It introduces the concept of improbable bigrams to expose tokenizer vulnerabilities and shows how alternative tokenizations can reduce hallucination rates.
Findings
Improbable bigrams cause increased hallucinations in models.
Alternative tokenization reduces hallucination rates by 90%.
Incomplete tokens are highly dependent on adjacent tokens.
Abstract
Tokenization is a crucial step that bridges human-readable text with model-readable discrete tokens. However, recent studies have revealed that tokenizers can be exploited to elicit unwanted model behaviors. In this work, we investigate incomplete tokens, i.e., undecodable tokens with stray bytes resulting from byte-level byte-pair encoding (BPE) tokenization. We hypothesize that such tokens are heavily reliant on their adjacent tokens and are fragile when paired with unfamiliar tokens. To demonstrate this vulnerability, we introduce improbable bigrams: out-of-distribution combinations of incomplete tokens designed to exploit their dependency. Our experiments show that improbable bigrams are significantly prone to hallucinatory behaviors. Surprisingly, the same phrases have drastically lower rates of hallucination (90% reduction in Llama3.1) when an alternative tokenization is used. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsSecurity and Verification in Computing · Cryptography and Data Security · Adversarial Robustness in Machine Learning
MethodsByte Pair Encoding
