RobustDebias: Debiasing Language Models using Distributionally Robust Optimization

Deep Gandhi; Katyani Singh; Nidhi Hegde

arXiv:2602.00405·cs.AI·February 3, 2026

RobustDebias: Debiasing Language Models using Distributionally Robust Optimization

Deep Gandhi, Katyani Singh, Nidhi Hegde

PDF

Open Access

TL;DR

RobustDebias introduces a novel fine-tuning method using Distributionally Robust Optimization to effectively reduce biases in pretrained language models like BERT without sacrificing performance.

Contribution

The paper proposes RobustDebias, a new fine-tuning approach that applies Distributionally Robust Optimization to mitigate biases across demographics during language model training.

Findings

01

Significant bias reduction across multiple demographics.

02

Minimal impact on language model performance.

03

Generalizes to various datasets and tasks.

Abstract

Pretrained language models have been shown to exhibit biases and social stereotypes. Prior work on debiasing these models has largely focused on modifying embedding spaces during pretraining, which is not scalable for large models. Fine-tuning pretrained models on task-specific datasets can both degrade model performance and amplify biases present in the fine-tuning data. We address bias amplification during fine-tuning rather than costly pretraining, focusing on BERT models due to their widespread use in language understanding tasks. While Empirical Risk Minimization effectively optimizes downstream performance, it often amplifies social biases during fine-tuning. To counter this, we propose \textit{RobustDebias}, a novel mechanism which adapts Distributionally Robust Optimization (DRO) to debias language models during fine-tuning. Our approach debiases models across multiple…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Multimodal Machine Learning Applications · Topic Modeling