LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context   Memory of Transformers

Anton Razzhigaev; Matvey Mikhalchuk; Temurbek Rahmatullaev; Elizaveta; Goncharova; Polina Druzhinina; Ivan Oseledets; Andrey Kuznetsov

arXiv:2502.15007·cs.CL·February 24, 2025

LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers

Anton Razzhigaev, Matvey Mikhalchuk, Temurbek Rahmatullaev, Elizaveta, Goncharova, Polina Druzhinina, Ivan Oseledets, Andrey Kuznetsov

PDF

1 Video

TL;DR

This paper reveals that minor tokens like punctuation and stopwords play a crucial role in LLMs' contextual memory, and introduces LLM-Microscope, a toolkit for analyzing token-level contributions and model representations.

Contribution

It uncovers the hidden importance of filler tokens in LLMs' context encoding and provides an open-source toolkit for detailed analysis of token and layer contributions.

Findings

01

Removing minor tokens degrades model performance.

02

High correlation between contextualization and linearity in embeddings.

03

Toolkit enables visualization and measurement of token and layer contributions.

Abstract

We introduce methods to quantify how Large Language Models (LLMs) encode and store contextual information, revealing that tokens often seen as minor (e.g., determiners, punctuation) carry surprisingly high context. Notably, removing these tokens -- especially stopwords, articles, and commas -- consistently degrades performance on MMLU and BABILong-4k, even if removing only irrelevant tokens. Our analysis also shows a strong correlation between contextualization and linearity, where linearity measures how closely the transformation from one layer's embeddings to the next can be approximated by a single linear mapping. These findings underscore the hidden importance of filler tokens in maintaining context. For further exploration, we present LLM-Microscope, an open-source toolkit that assesses token-level nonlinearity, evaluates contextual memory, visualizes intermediate layer…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers· underline