Diagnosing Memorization in Chain-of-Thought Reasoning, One Token at a Time

Huihan Li; You Chen; Siyuan Wang; Yixin He; Ninareh Mehrabi; Rahul Gupta; Xiang Ren

arXiv:2508.02037·cs.CL·August 22, 2025

Diagnosing Memorization in Chain-of-Thought Reasoning, One Token at a Time

Huihan Li, You Chen, Siyuan Wang, Yixin He, Ninareh Mehrabi, Rahul Gupta, Xiang Ren

PDF

Open Access 1 Video

TL;DR

This paper introduces STIM, a framework that identifies and analyzes the sources of memorization in language models during reasoning, revealing how memorization impacts errors and reasoning accuracy.

Contribution

STIM provides a novel token-level attribution method to diagnose memorization sources in chain-of-thought reasoning, enhancing understanding of model errors.

Findings

01

Models rely more on memorization in complex cases.

02

Local memorization often causes up to 67% of errors.

03

Memorization scores can predict wrong reasoning tokens.

Abstract

Large Language Models (LLMs) perform well on reasoning benchmarks but often fail when inputs alter slightly, raising concerns about the extent to which their success relies on memorization. This issue is especially acute in Chain-of-Thought (CoT) reasoning, where spurious memorized patterns can trigger intermediate errors that cascade into incorrect final answers. We introduce STIM, a novel framework for Source-aware Token-level Identification of Memorization, which attributes each token in a reasoning chain to one of multiple memorization sources - local, mid-range, or long-range - based on their statistical co-occurrence with the token in the pretraining corpus. Our token-level analysis across tasks and distributional settings reveals that models rely more on memorization in complex or long-tail cases, and that local memorization is often the dominant driver of errors, leading to up…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Diagnosing Memorization in Chain-of-Thought Reasoning, One Token at a Time· underline

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques