Large Language Models Are Still Misled by Simple Bias Ensembles
Zhouhao Sun, Zhiyuan Kan, Xiao Ding, Li Du, Bibo Cai, Yang Zhao, Bing Qin, Ting Liu

TL;DR
This paper introduces a multi-bias benchmark for large language models, revealing their vulnerability to compounded biases in complex, real-world scenarios, and highlighting the limitations of current debiasing methods.
Contribution
It presents a novel multi-bias benchmark dataset and evaluates LLMs, demonstrating their poor performance against combined biases and exposing gaps in existing debiasing techniques.
Findings
LLMs perform poorly on the multi-bias benchmark.
Existing debiasing methods are ineffective against compounded biases.
Real-world scenarios pose significant challenges due to multiple biases.
Abstract
With the evolution of large language models (LLMs), their robustness against individual simple biases has been enhanced. However, we observe that the ensemble of multiple simple biases still exerts a significant adverse impact on LLMs. Given that real-world data samples are typically confounded by a wide range of biases, LLMs tend to exhibit unstable performance when deployed in high-stakes real-world scenarios such as clinical diagnosis and legal document analysis. However, previous benchmarks are constrained to datasets where each sample is manually injected with only one type of bias. To bridge this gap, we propose a multi-bias benchmark where each sample contains multiple types of biases. Experimental results reveal that existing LLMs and debiasing methods perform poorly on this benchmark, highlighting the challenge of eliminating such compounded biases.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
