A Lightweight Method to Disrupt Memorized Sequences in LLM

Parjanya Prajakta Prashant; Kaustubh Ponkshe; Babak Salimi

arXiv:2502.05159·cs.LG·May 28, 2025

A Lightweight Method to Disrupt Memorized Sequences in LLM

Parjanya Prajakta Prashant, Kaustubh Ponkshe, Babak Salimi

PDF

Open Access

TL;DR

TokenSwap is a lightweight, post-hoc method that reduces memorization in large language models by swapping token probabilities with smaller models, maintaining performance while enhancing safety.

Contribution

Introduces TokenSwap, a practical post-hoc technique that mitigates memorization in large language models using small auxiliary models without retraining.

Findings

01

Up to 10× reduction in memorization

02

Negligible impact on task performance

03

Applicable to models like Pythia and Llama-3

Abstract

As language models scale, their performance improves dramatically across a wide range of tasks, but so does their tendency to memorize and regurgitate parts of their training data verbatim. This tradeoff poses serious legal, ethical, and safety concerns, especially in real-world deployments. Existing mitigation techniques, such as differential privacy or model unlearning, often require retraining or access to internal weights making them impractical for most users. In this work, we introduce TokenSwap, a lightweight, post-hoc defense designed for realistic settings where the user can only access token-level outputs. Our key insight is that while large models are necessary for high task performance, small models (e.g., DistilGPT-2) are often sufficient to assign fluent, grammatically plausible probabilities to common function words - and crucially, they memorize far less. By selectively…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Algorithms and Data Compression