TL;DR
This paper reveals that minor tokens like punctuation and stopwords play a crucial role in LLMs' contextual memory, and introduces LLM-Microscope, a toolkit for analyzing token-level contributions and model representations.
Contribution
It uncovers the hidden importance of filler tokens in LLMs' context encoding and provides an open-source toolkit for detailed analysis of token and layer contributions.
Findings
Removing minor tokens degrades model performance.
High correlation between contextualization and linearity in embeddings.
Toolkit enables visualization and measurement of token and layer contributions.
Abstract
We introduce methods to quantify how Large Language Models (LLMs) encode and store contextual information, revealing that tokens often seen as minor (e.g., determiners, punctuation) carry surprisingly high context. Notably, removing these tokens -- especially stopwords, articles, and commas -- consistently degrades performance on MMLU and BABILong-4k, even if removing only irrelevant tokens. Our analysis also shows a strong correlation between contextualization and linearity, where linearity measures how closely the transformation from one layer's embeddings to the next can be approximated by a single linear mapping. These findings underscore the hidden importance of filler tokens in maintaining context. For further exploration, we present LLM-Microscope, an open-source toolkit that assesses token-level nonlinearity, evaluates contextual memory, visualizes intermediate layer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
