Actions Speak Louder than Words: Agent Decisions Reveal Implicit Biases in Language Models
Yuxuan Li, Hirokazu Shirado, Sauvik Das

TL;DR
This paper introduces a technique to reveal implicit sociodemographic biases in large language models by analyzing their decision-making across different personas, uncovering significant and amplified biases compared to real-world disparities.
Contribution
The study presents a novel method to systematically detect implicit biases in LLMs through sociodemographically-informed decision scenarios, revealing biases that are often amplified in advanced models.
Findings
State-of-the-art LLMs show significant sociodemographic disparities.
More advanced models exhibit greater implicit biases.
Biases uncovered are directionally aligned but amplified compared to real-world data.
Abstract
While advances in fairness and alignment have helped mitigate overt biases exhibited by large language models (LLMs) when explicitly prompted, we hypothesize that these models may still exhibit implicit biases when simulating human behavior. To test this hypothesis, we propose a technique to systematically uncover such biases across a broad range of sociodemographic categories by assessing decision-making disparities among agents with LLM-generated, sociodemographically-informed personas. Using our technique, we tested six LLMs across three sociodemographic groups and four decision-making scenarios. Our results show that state-of-the-art LLMs exhibit significant sociodemographic disparities in nearly all simulations, with more advanced models exhibiting greater implicit biases despite reducing explicit biases. Furthermore, when comparing our findings to real-world disparities reported…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLanguage and cultural evolution
