From Tokens to Words: On the Inner Lexicon of LLMs
Guy Kaplan, Matanel Oren, Yuval Reif, and Roy Schwartz

TL;DR
This paper investigates how large language models internally represent words, revealing an intrinsic detokenization process that combines sub-words into coherent words, enabling vocabulary expansion without retraining.
Contribution
The study uncovers the internal detokenization mechanism in LLMs and demonstrates a practical method for expanding their vocabulary without additional training.
Findings
LLMs perform an intrinsic detokenization process within early and middle layers.
Models can understand out-of-vocabulary words from internal representations.
Expanding vocabulary reduces input length and inference time with minimal accuracy loss.
Abstract
Natural language is composed of words, but modern large language models (LLMs) process sub-words as input. A natural question raised by this discrepancy is whether LLMs encode words internally, and if so how. We present evidence that LLMs engage in an intrinsic detokenization process, where sub-word sequences are combined into coherent whole-word representations at their last token. Our experiments show that this process primarily takes place within the early and middle layers of the model. We further demonstrate its robustness to arbitrary splits (e.g., "cats" to "ca" and "ts"), typos, and importantly-to out-of-vocabulary words: when feeding the last token internal representations of such words to the model as input, it can "understand" them as the complete word despite never seeing such representations as input during training. Our findings suggest that LLMs maintain a latent…
Peer Reviews
Decision·ICLR 2025 Poster
This paper answers particular unanswered questions surrounding "detokenization", which has been repeatedly observed and discussed without being properly studied. These are important for observations around, for example, stages of inference in language models. Interpretability results on early layers of LMs are often lacking, as vocab projections are much easier to perform at later layers. This work provides interesting and convincing results for one role early layers take on in these models, wh
The evidence for a third stage of processing in Figure 2b is a little sparse. These results are only for one model, and the degree to which accuracy drops is not substantial enough to obviously be due to a difference in processing altogether. These results could be made stronger by including results for more models. As a motivating example, it is fine, but perhaps isn't the best use of that space if this point can't be made more strongly. Typos: L370: "form"
1. This paper analyzes the process of detokenization across transformer network layers via a series of targeted experiments. It builds an intuitive understanding that agrees with many prior works in layer-based analysis. 2. The paper proposes an interesting method for training-free expansion of the model vocabulary by leveraging the insights into internal word representations. This method is shown to be effective in limited experiments. See below in "weaknesses" for further thoughts on this. 3
1. The concept of an inner lexicon is interesting, but not novel as is claimed in this work. The idea follows implicitly from prior work in the memorization of training data, and explicitly in works about tokenization, such as the introduction of BPE (which is discussed greatly in this paper). It is the stated goal of subword tokenizers to enable learning a vocabulary of words and concepts which is larger than the vocabulary of concrete tokens through the process of token combination. It is nice
This paper addresses a crucial question: how can language models construct symbolic representations of entire words when their input comes from tokenizers that often fragment words in ways that disregard their morphological structure? Specifically, the authors investigate whether LMs internally form representations of morphological units that help bridge the gap between the tokenized input and the naturally holistic nature of words in language. Through experiments, the paper presents some eviden
I believe there is a disparity between the paper’s claims and the experimental evidence provided to support them. Specifically, some of the experiments lend themselves to alternative interpretations, which could be clarified with additional baselines or experiments. The paper claims is that model come up with an "internal lexicon" that create hidden representations of "virtual" words, even when fed, e.g., word pieces as input. This is a claim on the computation carried out by the model, i.e., it
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Translation Studies and Practices
