Assessing and Mitigating Data Memorization Risks in Fine-Tuned Large Language Models
Badrinath Ramakrishnan, Akshaya Balaji

TL;DR
This paper analyzes privacy risks from data memorization in fine-tuned LLMs and proposes a multi-layered privacy protection framework that significantly reduces data leakage while preserving model utility.
Contribution
It provides a comprehensive empirical study of data memorization in fine-tuned LLMs and introduces four novel privacy protection techniques with proven effectiveness.
Findings
Fine-tuning with sensitive data increases privacy leakage from 0-5% to 60-75%.
Proposed methods reduce data leakage to 0%.
Protection techniques maintain 94.7% of model utility.
Abstract
Large Language Models (LLMs) have demonstrated remarkable capabilities across diverse natural language processing tasks, but their tendency to memorize training data poses significant privacy risks, particularly during fine-tuning processes. This paper presents a comprehensive empirical analysis of data memorization in fine-tuned LLMs and introduces a novel multi-layered privacy protection framework. Through controlled experiments on modern LLM architectures including GPT-2, Phi-3, and Gemma-2, we demonstrate that fine-tuning with repeated sensitive data increases privacy leakage rates from baseline levels of 0-5% to 60-75%, representing a 64.2% average increase across tested models. We propose and rigorously evaluate four complementary privacy protection methods: semantic data deduplication, differential privacy during generation, entropy-based filtering, and pattern-based content…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
