White Men Lead, Black Women Help? Benchmarking and Mitigating Language Agency Social Biases in LLMs
Yixin Wan, Kai-Wei Chang

TL;DR
This paper introduces LABE, a benchmark for evaluating social biases related to language agency in LLMs, revealing significant gender and intersectional biases, and proposes MSR, a new bias mitigation method that outperforms prompt-based approaches.
Contribution
The paper develops LABE, a comprehensive benchmark for assessing language agency biases in LLMs, and proposes MSR, a novel, more effective bias mitigation strategy.
Findings
LLMs exhibit greater gender bias than human texts.
Intersectional bias levels are notably higher in LLMs.
Prompt-based mitigation often worsens biases.
Abstract
Social biases can manifest in language agency. However, very limited research has investigated such biases in Large Language Model (LLM)-generated content. In addition, previous works often rely on string-matching techniques to identify agentic and communal words within texts, falling short of accurately classifying language agency. We introduce the Language Agency Bias Evaluation (LABE) benchmark, which comprehensively evaluates biases in LLMs by analyzing agency levels attributed to different demographic groups in model generations. LABE tests for gender, racial, and intersectional language agency biases in LLMs on 3 text generation tasks: biographies, professor reviews, and reference letters. Using LABE, we unveil language agency social biases in 3 recent LLMs: ChatGPT, Llama3, and Mistral. We observe that: (1) LLM generations tend to demonstrate greater gender bias than…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsGender Studies in Language · Multilingual Education and Policy
MethodsALIGN
