Identity-Robust Language Model Generation via Content Integrity Preservation
Miao Zhang, Kelly Chen, Md Mehrab Tanjim, Rumi Chunara

TL;DR
This paper introduces a training-free method to reduce identity-dependent bias in large language models, ensuring more consistent and fair responses across diverse sociodemographic groups while preserving content quality.
Contribution
It proposes a novel, lightweight framework that neutralizes non-essential identity information without retraining, improving bias mitigation in LLM outputs.
Findings
77% reduction in identity bias compared to standard prompting
45% reduction relative to prompt-based defenses
Effective across four benchmarks and 18 identities
Abstract
Large Language Model (LLM) outputs often vary across user sociodemographic attributes, leading to disparities in factual accuracy, utility, and safety, even for objective questions where demographic information is irrelevant. Unlike prior work on stereotypical or representational bias, this paper studies identity-dependent degradation of core response quality. We show empirically that such degradation arises from biased generation behavior, despite factual knowledge being robustly encoded across identities. Motivated by this mismatch, we propose a lightweight, training-free framework for identity-robust generation that selectively neutralizes non-critical identity information while preserving semantically essential attributes, thus maintaining output content integrity. Experiments across four benchmarks and 18 sociodemographic identities demonstrate an average 77% reduction in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Generative Adversarial Networks and Image Synthesis · Artificial Intelligence in Healthcare and Education
