How Does Differential Privacy Affect Social Bias in LLMs? A Systematic Evaluation

Eduardo Tenorio; Karuna Bhaila; Xintao Wu

arXiv:2605.11195·cs.CL·May 13, 2026

How Does Differential Privacy Affect Social Bias in LLMs? A Systematic Evaluation

Eduardo Tenorio, Karuna Bhaila, Xintao Wu

PDF

TL;DR

This paper systematically evaluates how differential privacy impacts social bias in large language models across multiple tasks, revealing nuanced effects and emphasizing comprehensive fairness assessment.

Contribution

It provides the first systematic analysis of the relationship between differential privacy and social bias in LLMs across diverse evaluation paradigms.

Findings

01

DP reduces bias in sentence scoring tasks

02

Bias reduction does not generalize across all tasks

03

Decreasing memorization does not necessarily reduce unfairness

Abstract

Large language models (LLMs) trained on web-scale corpora can memorize sensitive training data, posing significant privacy risks. Differential privacy (DP) has emerged as a principled framework that limits the influence of individual data points during training, yet the relationship between differential privacy and social bias in LLMs remains poorly understood. To investigate this, we present a systematic evaluation of social bias in a pretrained LLM trained with DP-SGD, comparing a DP model against non-DP baselines across four complementary paradigms: sentence scoring, text completion, tabular classification, and question answering. We find that DP reduces bias in sentence scoring tasks, where bias is measured through controlled likelihood comparisons, yet this improvement does not generalize across all tasks. Our results reveal a discrepancy between logit-level bias and output-level…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.