Inference-Time Reasoning Selectively Reduces Implicit Social Bias in Large Language Models
Molly Apsel, Michael N. Jones

TL;DR
This paper investigates how inference-time reasoning in large language models can reduce implicit social biases, revealing that enabling reasoning can alter fairness outcomes and highlighting the role of cognitive science theories in AI evaluation.
Contribution
It demonstrates that inference-time reasoning can significantly reduce implicit social bias in LLMs, offering new insights into bias mitigation strategies during inference.
Findings
Reasoning reduces implicit bias on IAT-style tests for some models
The bias reduction effect is specific to social domains
Inference-time reasoning does not reduce non-social implicit associations
Abstract
Drawing on constructs from psychology, prior work has identified a distinction between explicit and implicit bias in large language models (LLMs). While many LLMs undergo post-training alignment and safety procedures to avoid expressions of explicit social bias, they still exhibit significant implicit biases on indirect tasks resembling the Implicit Association Test (IAT). Recent work has further shown that inference-time reasoning can impair LLM performance on tasks that rely on implicit statistical learning. Motivated by a theoretical link between implicit associations and statistical learning in human cognition, we examine how reasoning-enabled inference affects implicit bias in LLMs. We find that enabling reasoning significantly reduces measured implicit bias on an IAT-style evaluation for some model classes across fifteen stereotype topics. This effect appears specific to social…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeurobiology of Language and Bilingualism · Language and cultural evolution · Computational and Text Analysis Methods
