RobustDebias: Debiasing Language Models using Distributionally Robust Optimization
Deep Gandhi, Katyani Singh, Nidhi Hegde

TL;DR
RobustDebias introduces a novel fine-tuning method using Distributionally Robust Optimization to effectively reduce biases in pretrained language models like BERT without sacrificing performance.
Contribution
The paper proposes RobustDebias, a new fine-tuning approach that applies Distributionally Robust Optimization to mitigate biases across demographics during language model training.
Findings
Significant bias reduction across multiple demographics.
Minimal impact on language model performance.
Generalizes to various datasets and tasks.
Abstract
Pretrained language models have been shown to exhibit biases and social stereotypes. Prior work on debiasing these models has largely focused on modifying embedding spaces during pretraining, which is not scalable for large models. Fine-tuning pretrained models on task-specific datasets can both degrade model performance and amplify biases present in the fine-tuning data. We address bias amplification during fine-tuning rather than costly pretraining, focusing on BERT models due to their widespread use in language understanding tasks. While Empirical Risk Minimization effectively optimizes downstream performance, it often amplifies social biases during fine-tuning. To counter this, we propose \textit{RobustDebias}, a novel mechanism which adapts Distributionally Robust Optimization (DRO) to debias language models during fine-tuning. Our approach debiases models across multiple…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Multimodal Machine Learning Applications · Topic Modeling
