Word Recovery in Large Language Models Enables Character-Level Tokenization Robustness
Zhipeng Yang, Shu Yang, Lijie Hu, Di Wang

TL;DR
This paper uncovers the mechanism called word recovery that explains why large language models are robust to character-level tokenization, showing how hidden states reconstruct words and how attention among characters is crucial.
Contribution
The study introduces a decoding-based method to detect word recovery, provides causal evidence of its importance, and offers a mechanistic explanation for tokenization robustness in LLMs.
Findings
Hidden states reconstruct canonical words from characters
Removing word recovery subspace degrades performance
Attention among characters is critical for word recovery
Abstract
Large language models (LLMs) trained with canonical tokenization exhibit surprising robustness to non-canonical inputs such as character-level tokenization, yet the mechanisms underlying this robustness remain unclear. We study this phenomenon through mechanistic interpretability and identify a core process we term word recovery. We first introduce a decoding-based method to detect word recovery, showing that hidden states reconstruct canonical word-level token identities from character-level inputs. We then provide causal evidence by removing the corresponding subspace from hidden states, which consistently degrades downstream task performance. Finally, we conduct a fine-grained attention analysis and show that in-group attention among characters belonging to the same canonical token is critical for word recovery: masking such attention in early layers substantially reduces both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Text Readability and Simplification · Neurobiology of Language and Bilingualism
