BiasAlert: A Plug-and-play Tool for Social Bias Detection in LLMs

Zhiting Fan; Ruizhe Chen; Ruiling Xu; Zuozhu Liu

arXiv:2407.10241·cs.CL·July 23, 2024

BiasAlert: A Plug-and-play Tool for Social Bias Detection in LLMs

Zhiting Fan, Ruizhe Chen, Ruiling Xu, Zuozhu Liu

PDF

Open Access 1 Video

TL;DR

BiasAlert is a versatile tool that effectively detects social bias in open-text outputs of LLMs by combining human knowledge and reasoning, outperforming existing methods and aiding bias evaluation and mitigation.

Contribution

We introduce BiasAlert, a novel plug-and-play bias detection tool that adapts to open-text generation scenarios and surpasses current state-of-the-art methods in accuracy.

Findings

01

BiasAlert outperforms GPT4-as-A-Judge in bias detection accuracy.

02

It demonstrates utility in bias evaluation across diverse LLM scenarios.

03

Model and code will be publicly released for community use.

Abstract

Evaluating the bias in Large Language Models (LLMs) becomes increasingly crucial with their rapid development. However, existing evaluation methods rely on fixed-form outputs and cannot adapt to the flexible open-text generation scenarios of LLMs (e.g., sentence completion and question answering). To address this, we introduce BiasAlert, a plug-and-play tool designed to detect social bias in open-text generations of LLMs. BiasAlert integrates external human knowledge with inherent reasoning capabilities to detect bias reliably. Extensive experiments demonstrate that BiasAlert significantly outperforms existing state-of-the-art methods like GPT4-as-A-Judge in detecting bias. Furthermore, through application studies, we demonstrate the utility of BiasAlert in reliable LLM bias evaluation and bias mitigation across various scenarios. Model and code will be publicly released.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

BiasAlert: A Plug-and-play Tool for Social Bias Detection in LLMs· underline

Taxonomy

TopicsSentiment Analysis and Opinion Mining · Imbalanced Data Classification Techniques