Leaner Training, Lower Leakage: Revisiting Memorization in LLM Fine-Tuning with LoRA
Fei Wang, Baochun Li

TL;DR
This paper investigates how LoRA fine-tuning affects memorization in large language models, revealing it reduces memorization risks compared to full fine-tuning while preserving task performance.
Contribution
It uncovers that factors influencing memorization in pre-training do not apply similarly in LoRA fine-tuning, highlighting its potential for safer LLM deployment.
Findings
LoRA significantly reduces memorization risks compared to full fine-tuning.
Model scale and data duplication influence memorization differently in LoRA.
LoRA maintains strong task performance despite lower memorization.
Abstract
Memorization in large language models (LLMs) makes them vulnerable to data extraction attacks. While pre-training memorization has been extensively studied, fewer works have explored its impact in fine-tuning, particularly for LoRA fine-tuning, a widely adopted parameter-efficient method. In this work, we re-examine memorization in fine-tuning and uncover a surprising divergence from prior findings across different fine-tuning strategies. Factors such as model scale and data duplication, which strongly influence memorization in pre-training and full fine-tuning, do not follow the same trend in LoRA fine-tuning. Using a more relaxed similarity-based memorization metric, we demonstrate that LoRA significantly reduces memorization risks compared to full fine-tuning, while still maintaining strong task performance.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBusiness Process Modeling and Analysis · Simulation Techniques and Applications · Advanced Data Processing Techniques
