Large Language Models Develop Novel Social Biases Through Adaptive Exploration
Addison J. Wu, Ryan Liu, Xuechunzi Bai, Thomas L. Griffiths

TL;DR
This paper shows that large language models can spontaneously develop new social biases through exploration, which can lead to unfair task allocations, and proposes interventions to mitigate this effect.
Contribution
It introduces a psychology-inspired paradigm to demonstrate emergent biases in LLMs and evaluates interventions to reduce stratification caused by exploration-exploitation trade-offs.
Findings
Explicit exploration incentives most effectively reduce stratification.
Larger models exacerbate bias and stratification.
Emergent biases are actively created by models, not just inherited.
Abstract
As large language models (LLMs) are adopted into frameworks that grant them the capacity to make real decisions, it is increasingly important to ensure that they are unbiased. In this paper, we argue that the predominant approach of simply removing existing biases from models is not enough. Using a paradigm from the psychology literature, we demonstrate that LLMs can spontaneously develop novel social biases about artificial demographic groups even when no inherent differences exist. These biases result in highly stratified task allocations, which are less fair than assignments by human participants and are exacerbated by newer and larger models. In social science, emergent biases like these have been shown to result from exploration-exploitation trade-offs, where the decision-maker explores too little, allowing early observations to strongly influence impressions about entire…
Peer Reviews
Decision·Submitted to ICLR 2026
- Investigate emerging biases, which have been underexplored - Test many models across six families and various schemes such as CoT - Explore interventions to reduce the emerging biases
While the paper is well written and offers insights into emerging biases in LLMs, the paper has limited novelty and contribution in my opinion. The results themselves are straightforward; when only demographic group information is available, models should naturally use that information to maximize their incentives. Providing more information about candidates would have reduced this effect (as demonstrated in the paper), because additional information allows the model to rely on other signals f
The paper is very well-motivated: lots of work have looked at how LLMs could be biased due to fundamental human data distribution but there’s little work in exploring how LLMs might form novel biases through interactions. I like that the authors engage with the literature across many fields with a good amount of depth in making the arguments and describing the background of this study. It is also very nice to have human baseline in a directly comparable setting.
Weaknesses: 1) The study models agentic behavior using a multi-turn dialogue where the entire history is passed in-context. This setup, while controlled, does not fully capture the architecture of modern agentic systems. Such systems often employ more sophisticated mechanisms like structured memory, explicit reflection steps (e.g., ReAct), and meta-cognitive abilities to decide whether a given experience is valuable enough to be integrated into its knowledge base. By "forcing" the model to lear
1. The multi-turn scenarios that the authors attempt to explore have not been widely studied, which could be regarded as a novel research topic. 2. The authors use multiple LLMs and attempt to evaluate them using different metrics.
1. The authors compare the simulation results of LLMs with those from human participants, but lack descriptions of the human participants, such as the sample size and distribution of demographic variables. 2. The authors need to provide more explanation for the three newly defined metrics. For example, how do SI and mutual information differ in form? What are the similarities and differences among BGD, GASI, and JSD? 3. If I understand correctly, the values of SI and BGD should be 0 under ra
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods · Language and cultural evolution · Artificial Intelligence in Healthcare and Education
