Learning the Signature of Memorization in Autoregressive Language Models

David Ili\'c; Kostadin Cvejoski; David Stanojevi\'c; Evgeny Grigorenko

arXiv:2604.03199·cs.CL·April 6, 2026

Learning the Signature of Memorization in Autoregressive Language Models

David Ili\'c, Kostadin Cvejoski, David Stanojevi\'c, Evgeny Grigorenko

PDF

1 Repo

TL;DR

This paper introduces LT-MIA, a transferable learned membership inference attack that detects memorization signatures across diverse language model architectures and data domains, surpassing prior heuristics.

Contribution

It presents the first transferable learned attack for language models, demonstrating generalization across architectures and data types by training on diverse transformer models.

Findings

01

Achieves high AUC scores (above 0.93) across multiple unseen architectures.

02

Transfers effectively to code data despite training only on natural language.

03

Outperforms existing heuristic-based membership inference methods significantly.

Abstract

All prior membership inference attacks for fine-tuned language models use hand-crafted heuristics (e.g., loss thresholding, Min-K\%, reference calibration), each bounded by the designer's intuition. We introduce the first transferable learned attack, enabled by the observation that fine-tuning any model on any corpus yields unlimited labeled data, since membership is known by construction. This removes the shadow model bottleneck and brings membership inference into the deep learning era: learning what matters rather than designing it, with generalization through training diversity and scale. We discover that fine-tuning language models produces an invariant signature of memorization detectable across architectural families and data domains. We train a membership inference classifier exclusively on transformer-based models. It transfers zero-shot to Mamba (state-space), RWKV-4 (linear…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

JetBrains-Research/learned-mia
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.