Guess or Recall? Training CNNs to Classify and Localize Memorization in LLMs

J\'er\'emie Dentan; Davide Buscaldi; Sonia Vanier

arXiv:2508.02573·cs.CL·November 14, 2025

Guess or Recall? Training CNNs to Classify and Localize Memorization in LLMs

J\'er\'emie Dentan, Davide Buscaldi, Sonia Vanier

PDF

Open Access 1 Video

TL;DR

This paper introduces a new taxonomy and interpretability method to analyze and localize different forms of memorization in LLMs by training CNNs on attention weights, revealing insights into how models memorize and recall information.

Contribution

The paper proposes a novel taxonomy aligned with attention weights and a visualization technique to distinguish and localize memorization mechanisms in LLMs, improving understanding of model behavior.

Findings

01

Existing taxonomy poorly reflects attention mechanisms

02

Most memorized samples are guessed, not recalled

03

Few-shot memorization is not a distinct attention process

Abstract

Verbatim memorization in Large Language Models (LLMs) is a multifaceted phenomenon involving distinct underlying mechanisms. We introduce a novel method to analyze the different forms of memorization described by the existing taxonomy. Specifically, we train Convolutional Neural Networks (CNNs) on the attention weights of the LLM and evaluate the alignment between this taxonomy and the attention weights involved in decoding. We find that the existing taxonomy performs poorly and fails to reflect distinct mechanisms within the attention blocks. We propose a new taxonomy that maximizes alignment with the attention weights, consisting of three categories: memorized samples that are guessed using language modeling abilities, memorized samples that are recalled due to high duplication in the training set, and non-memorized samples. Our results reveal that few-shot verbatim memorization…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Guess or Recall? Training CNNs to Classify and Localize Memorization in LLMs· underline

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI)