The Tug of War Within: Mitigating the Fairness-Privacy Conflicts in Large Language Models
Chen Qian, Dongrui Liu, Jie Zhang, Yong Liu, Jing Shao

TL;DR
This paper introduces SPIN, a training-free method inspired by information theory, to simultaneously improve fairness and privacy awareness in large language models, overcoming a trade-off observed with traditional fine-tuning methods.
Contribution
The paper proposes SPIN, a novel training-free approach that reduces the mutual information between fairness and privacy neurons, effectively mitigating their trade-off in LLMs.
Findings
SPIN improves fairness awareness by 12.2%.
SPIN enhances privacy awareness by 14.0%.
SPIN remains effective with limited or malicious data.
Abstract
Ensuring awareness of fairness and privacy in Large Language Models (LLMs) is critical. Interestingly, we discover a counter-intuitive trade-off phenomenon that enhancing an LLM's privacy awareness through Supervised Fine-Tuning (SFT) methods significantly decreases its fairness awareness with thousands of samples. To address this issue, inspired by the information theory, we introduce a training-free method to \textbf{S}uppress the \textbf{P}rivacy and fa\textbf{I}rness coupled \textbf{N}eurons (\textbf{SPIN}), which theoretically and empirically decrease the mutual information between fairness and privacy awareness. Extensive experimental results demonstrate that SPIN eliminates the trade-off phenomenon and significantly improves LLMs' fairness and privacy awareness simultaneously without compromising general capabilities, \eg improving Qwen-2-7B-Instruct's fairness awareness by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Adversarial Robustness in Machine Learning
MethodsShrink and Fine-Tune
