Does Differential Privacy Impact Bias in Pretrained NLP Models?
Md. Khairul Islam, Andrew Wang, Tianhao Wang, Yangfeng Ji, Judy Fox,, Jieyu Zhao

TL;DR
This paper empirically investigates how differential privacy in fine-tuning large language models can increase bias against protected groups, influenced by privacy levels and dataset distribution.
Contribution
It provides the first empirical analysis of DP's impact on bias in LLMs, highlighting increased bias and the influence of dataset distribution.
Findings
DP can increase bias against protected groups
DP affects the model's ability to differentiate group examples
Bias impact depends on privacy level and dataset distribution
Abstract
Differential privacy (DP) is applied when fine-tuning pre-trained large language models (LLMs) to limit leakage of training examples. While most DP research has focused on improving a model's privacy-utility tradeoff, some find that DP can be unfair to or biased against underrepresented groups. In this work, we show the impact of DP on bias in LLMs through empirical analysis. Differentially private training can increase the model bias against protected groups w.r.t AUC-based bias metrics. DP makes it more difficult for the model to differentiate between the positive and negative examples from the protected groups and other groups in the rest of the population. Our results also show that the impact of DP on bias is not only affected by the privacy protection level but also the underlying distribution of the dataset.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI)
