MBIAS: Mitigating Bias in Large Language Models While Retaining Context

Shaina Raza; Ananya Raval; Veronica Chatrath

arXiv:2405.11290·cs.CL·July 1, 2024

MBIAS: Mitigating Bias in Large Language Models While Retaining Context

Shaina Raza, Ananya Raval, Veronica Chatrath

PDF

Open Access 1 Repo 3 Models

TL;DR

MBIAS is a fine-tuning framework for large language models that effectively reduces bias and toxicity while maintaining contextual accuracy, using a custom safety-focused dataset and human-in-the-loop evaluation.

Contribution

Introduces MBIAS, a novel instruction fine-tuning approach with a specialized dataset to mitigate bias and toxicity in LLMs without losing contextual information.

Findings

01

Over 30% reduction in bias and toxicity in standard evaluations

02

More than 90% reduction in bias across diverse demographic tests

03

Provides datasets and models for community use and reproducibility

Abstract

The deployment of Large Language Models (LLMs) in diverse applications necessitates an assurance of safety without compromising the contextual integrity of the generated content. Traditional approaches, including safety-specific fine-tuning or adversarial testing, often yield safe outputs at the expense of contextual meaning. This can result in a diminished capacity to handle nuanced aspects of bias and toxicity, such as underrepresentation or negative portrayals across various demographics. To address these challenges, we introduce MBIAS, an LLM framework carefully instruction fine-tuned on a custom dataset designed specifically for safety interventions. MBIAS is designed to significantly reduce biases and toxic elements in LLM outputs while preserving the main information. This work also details our further use of LLMs: as annotator under human supervision and as evaluator of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shainarazavi/mbias
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling