How Does Differential Privacy Affect Social Bias in LLMs? A Systematic Evaluation
Eduardo Tenorio, Karuna Bhaila, Xintao Wu

TL;DR
This paper systematically evaluates how differential privacy impacts social bias in large language models across multiple tasks, revealing nuanced effects and emphasizing comprehensive fairness assessment.
Contribution
It provides the first systematic analysis of the relationship between differential privacy and social bias in LLMs across diverse evaluation paradigms.
Findings
DP reduces bias in sentence scoring tasks
Bias reduction does not generalize across all tasks
Decreasing memorization does not necessarily reduce unfairness
Abstract
Large language models (LLMs) trained on web-scale corpora can memorize sensitive training data, posing significant privacy risks. Differential privacy (DP) has emerged as a principled framework that limits the influence of individual data points during training, yet the relationship between differential privacy and social bias in LLMs remains poorly understood. To investigate this, we present a systematic evaluation of social bias in a pretrained LLM trained with DP-SGD, comparing a DP model against non-DP baselines across four complementary paradigms: sentence scoring, text completion, tabular classification, and question answering. We find that DP reduces bias in sentence scoring tasks, where bias is measured through controlled likelihood comparisons, yet this improvement does not generalize across all tasks. Our results reveal a discrepancy between logit-level bias and output-level…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
