The Tug of War Within: Mitigating the Fairness-Privacy Conflicts in Large Language Models

Chen Qian; Dongrui Liu; Jie Zhang; Yong Liu; Jing Shao

arXiv:2410.16672·cs.AI·June 4, 2025

The Tug of War Within: Mitigating the Fairness-Privacy Conflicts in Large Language Models

Chen Qian, Dongrui Liu, Jie Zhang, Yong Liu, Jing Shao

PDF

Open Access 1 Repo

TL;DR

This paper introduces SPIN, a training-free method inspired by information theory, to simultaneously improve fairness and privacy awareness in large language models, overcoming a trade-off observed with traditional fine-tuning methods.

Contribution

The paper proposes SPIN, a novel training-free approach that reduces the mutual information between fairness and privacy neurons, effectively mitigating their trade-off in LLMs.

Findings

01

SPIN improves fairness awareness by 12.2%.

02

SPIN enhances privacy awareness by 14.0%.

03

SPIN remains effective with limited or malicious data.

Abstract

Ensuring awareness of fairness and privacy in Large Language Models (LLMs) is critical. Interestingly, we discover a counter-intuitive trade-off phenomenon that enhancing an LLM's privacy awareness through Supervised Fine-Tuning (SFT) methods significantly decreases its fairness awareness with thousands of samples. To address this issue, inspired by the information theory, we introduce a training-free method to \textbf{S}uppress the \textbf{P}rivacy and fa\textbf{I}rness coupled \textbf{N}eurons (\textbf{SPIN}), which theoretically and empirically decrease the mutual information between fairness and privacy awareness. Extensive experimental results demonstrate that SPIN eliminates the trade-off phenomenon and significantly improves LLMs' fairness and privacy awareness simultaneously without compromising general capabilities, \eg improving Qwen-2-7B-Instruct's fairness awareness by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

chnq/dean
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Adversarial Robustness in Machine Learning

MethodsShrink and Fine-Tune