Predicting memorization within Large Language Models fine-tuned for classification
J\'er\'emie Dentan, Davide Buscaldi, Aymen Shabou, Sonia Vanier

TL;DR
This paper introduces a novel, low-cost method to detect memorized training samples in large language models during early training stages, enhancing data privacy and model robustness.
Contribution
It presents a new a priori detection approach for memorized data in fine-tuned LLMs, supported by theoretical insights and adaptable to various classification models.
Findings
Effective early-stage detection of memorized samples
Method requires low computational resources
Supports systematic identification of vulnerable data
Abstract
Large Language Models have received significant attention due to their abilities to solve a wide range of complex tasks. However these models memorize a significant proportion of their training data, posing a serious threat when disclosed at inference time. To mitigate this unintended memorization, it is crucial to understand what elements are memorized and why. This area of research is largely unexplored, with most existing works providing a posteriori explanations. To address this gap, we propose a new approach to detect memorized samples a priori in LLMs fine-tuned for classification tasks. This method is effective from the early stages of training and readily adaptable to other classification settings, such as training vision models from scratch. Our method is supported by new theoretical results, and requires a low computational budget. We achieve strong empirical results, paving…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
