BiasDPO: Mitigating Bias in Language Models through Direct Preference   Optimization

Ahmed Allam

arXiv:2407.13928·cs.CL·July 22, 2024

BiasDPO: Mitigating Bias in Language Models through Direct Preference Optimization

Ahmed Allam

PDF

Open Access 1 Video

TL;DR

This paper presents BiasDPO, a framework that uses Direct Preference Optimization to reduce biases in language models, improving ethical language generation and outperforming baseline models on bias benchmarks.

Contribution

Introduction of BiasDPO, a novel bias mitigation method using preference optimization and a new bias recognition dataset for LLMs.

Findings

01

Significant reduction in biased outputs in the Microsoft Phi-2 model.

02

Outperforms baseline and open-source models on bias benchmarks.

03

Public release of BiasDPO dataset for further research.

Abstract

Large Language Models (LLMs) have become pivotal in advancing natural language processing, yet their potential to perpetuate biases poses significant concerns. This paper introduces a new framework employing Direct Preference Optimization (DPO) to mitigate gender, racial, and religious biases in LLM-generated English text. By developing a loss function that favors less biased over biased completions, our approach cultivates a preference for respectful and non-discriminatory language in LLMs. We also contribute a manually designed dataset for training LLMs to recognize and correct biases. This dataset encompasses a diverse range of prompts paired with both biased and unbiased completions. Implementing this approach on the Microsoft Phi-2 model, we demonstrate substantial reductions in biased outputs as our model outperforms the baseline model on almost all bias benchmarks. Our model also…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

BiasDPO: Mitigating Bias in Language Models through Direct Preference Optimization· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems