Revisiting Privacy, Utility, and Efficiency Trade-offs when Fine-Tuning Large Language Models
Soumi Das, Camila Kolling, Mohammad Aflah Khan, Mahsa Amani, Bishwamittra Ghosh, Qinyuan Wu, Till Speicher, and Krishna P. Gummadi

TL;DR
This paper explores the trade-offs between privacy, utility, and efficiency in fine-tuning large language models, revealing that efficient methods like LoRA can mitigate privacy risks similarly to private training methods, challenging previous assumptions.
Contribution
It demonstrates empirically that efficient fine-tuning methods can reduce privacy risks comparable to differential privacy, contradicting the belief that privacy and efficiency are at odds.
Findings
Efficient fine-tuning methods like LoRA mitigate privacy risks similar to differential privacy.
Privacy and efficiency objectives are not necessarily conflicting during fine-tuning.
Extensive evaluations across multiple models and datasets support these conclusions.
Abstract
We study the inherent trade-offs in minimizing privacy risks and maximizing utility, while maintaining high computational efficiency, when fine-tuning large language models (LLMs). A number of recent works in privacy research have attempted to mitigate privacy risks posed by memorizing fine-tuning data by using differentially private training methods (e.g., DP), albeit at a significantly higher computational cost (inefficiency). In parallel, several works in systems research have focussed on developing (parameter) efficient fine-tuning methods (e.g., LoRA), but few works, if any, investigated whether such efficient methods enhance or diminish privacy risks. In this paper, we investigate this gap and arrive at a surprising conclusion: efficient fine-tuning methods like LoRA mitigate privacy risks similar to private fine-tuning methods like DP. Our empirical finding directly contradicts…
Peer Reviews
Decision·Submitted to ICLR 2026
The idea of empirically computing effective privacy (in a DP sense) protection from non-DP heuristics is a common strategy. This paper takes the same approach for large language models with parameter-efficient fine-tuning as a heuristic. The paper is well-written, well-organized, and easy to follow (although the plots are hard to interpret, more on this later).
[W1] **Justification of the Metrics**: Even assuming the identification of sensitive and non-sensitive tokens is perfect (more in W2), the newly introduced privacy and utility metrics are not fully justified. For example: * The onus is on the paper to justify these metrics fully and put them in context given past work. There are several standard and well-accepted measures of privacy: exposure (which is used in the experiments), success of membership inference attacks, etc. Why is this measure be
- The paper addresses an important problem of understanding privacy, utility and efficiency tradeoffs of LLM finetuning. - The paper is well written and easy to follow. - The experiments are conducted on a wide variety of models (Pythia, Gemma, Llama, Qwen) and on 2 different datasets.
- DP-SGD baseline seems deeply flawed. Several issues that jump out are: no epsilon guarantee, no mention of large batch sizes, token level clipping ("...where each sample corresponds to a token..."). - Does not compare against DP-LoRA - Privacy loss is fairly non-standard. - Theorem 1 is trivial and does not add any insights. - Using GPT-4 for identifying sensitive tokens seems problematic when it is a top line metric for this work.
1. Broad empirical study across multiple model families datasets and fine tuning methods with consistent comparisons, including privacy loss and canary exposure. 2. A new loss function that clearly divides the loss into utility and privacy. 3. This paper presents interesting findings about the privacy and utility of LoRA.
1. Though the analysis of privacy and utility loss is interesting, the current design of privacy loss seems to focus only on canary-related attacks, ignoring membership inference attack, which is also an important part of privacy. Therefore, the findings on the Privacy of LoRA might be overstated. 2. Some phenomenon during training should be explained. For example, in Figure 3 (b), the utility actually decreases with more training. 3. Only LoRA is considered as PEFT methods in the paper. Exper
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data
MethodsPythia · LLaMA
