De-amplifying Bias from Differential Privacy in Language Model   Fine-tuning

Sanjari Srivastava; Piotr Mardziel; Zhikhun Zhang; Archana Ahlawat,; Anupam Datta; John C Mitchell

arXiv:2402.04489·cs.LG·February 8, 2024·1 cites

De-amplifying Bias from Differential Privacy in Language Model Fine-tuning

Sanjari Srivastava, Piotr Mardziel, Zhikhun Zhang, Archana Ahlawat,, Anupam Datta, John C Mitchell

PDF

Open Access

TL;DR

This paper investigates how differential privacy during fine-tuning of large language models amplifies social biases, and proposes combining counterfactual data augmentation with DP to mitigate bias while preserving privacy.

Contribution

It reveals that DP amplifies biases in LLM fine-tuning and demonstrates that CDA can reduce this bias amplification, enabling fair and private model training.

Findings

01

DP amplifies gender, racial, and religious bias in LLMs

02

CDA mitigates bias amplification caused by DP

03

Combining DP and CDA maintains fairness and privacy

Abstract

Fairness and privacy are two important values machine learning (ML) practitioners often seek to operationalize in models. Fairness aims to reduce model bias for social/demographic sub-groups. Privacy via differential privacy (DP) mechanisms, on the other hand, limits the impact of any individual's training data on the resulting model. The trade-offs between privacy and fairness goals of trustworthy ML pose a challenge to those wishing to address both. We show that DP amplifies gender, racial, and religious bias when fine-tuning large language models (LLMs), producing models more biased than ones fine-tuned without DP. We find the cause of the amplification to be a disparity in convergence of gradients across sub-groups. Through the case of binary gender bias, we demonstrate that Counterfactual Data Augmentation (CDA), a known method for addressing bias, also mitigates bias amplification…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data