BiasAlert: A Plug-and-play Tool for Social Bias Detection in LLMs
Zhiting Fan, Ruizhe Chen, Ruiling Xu, Zuozhu Liu

TL;DR
BiasAlert is a versatile tool that effectively detects social bias in open-text outputs of LLMs by combining human knowledge and reasoning, outperforming existing methods and aiding bias evaluation and mitigation.
Contribution
We introduce BiasAlert, a novel plug-and-play bias detection tool that adapts to open-text generation scenarios and surpasses current state-of-the-art methods in accuracy.
Findings
BiasAlert outperforms GPT4-as-A-Judge in bias detection accuracy.
It demonstrates utility in bias evaluation across diverse LLM scenarios.
Model and code will be publicly released for community use.
Abstract
Evaluating the bias in Large Language Models (LLMs) becomes increasingly crucial with their rapid development. However, existing evaluation methods rely on fixed-form outputs and cannot adapt to the flexible open-text generation scenarios of LLMs (e.g., sentence completion and question answering). To address this, we introduce BiasAlert, a plug-and-play tool designed to detect social bias in open-text generations of LLMs. BiasAlert integrates external human knowledge with inherent reasoning capabilities to detect bias reliably. Extensive experiments demonstrate that BiasAlert significantly outperforms existing state-of-the-art methods like GPT4-as-A-Judge in detecting bias. Furthermore, through application studies, we demonstrate the utility of BiasAlert in reliable LLM bias evaluation and bias mitigation across various scenarios. Model and code will be publicly released.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsSentiment Analysis and Opinion Mining · Imbalanced Data Classification Techniques
