Us-vs-Them bias in Large Language Models
Tabia Tanzin Prama, Julia Witte Zimmerman, Christopher M. Danforth, Peter Sheridan Dodds

TL;DR
This paper explores social identity bias in large language models, revealing how personas influence ingroup/outgroup attitudes and demonstrating a mitigation method to reduce bias.
Contribution
It provides a comprehensive analysis of social bias in LLMs and introduces ION, a fine-tuning approach to mitigate 'us versus them' bias.
Findings
Consistent ingroup-positive and outgroup-negative associations across models
Persona conditioning alters linguistic and semantic patterns
Bias mitigation reduces sentiment divergence by up to 69%
Abstract
This study investigates ``us versus them'' bias, as described by Social Identity Theory, in large language models (LLMs) under both default and persona-conditioned settings across multiple architectures (GPT-4.1, DeepSeek-3.1, Gemma-2.0, Grok-3.0, and LLaMA-3.1). Using sentiment dynamics, allotaxonometry, and embedding regression, we find consistent ingroup-positive and outgroup-negative associations across foundational LLMs. We find that adopting a persona systematically alters models' evaluative and affiliative language patterns. For the exemplar personas examined, conservative personas exhibit greater outgroup hostility, whereas liberal personas display stronger ingroup solidarity. Persona conditioning produces distinct clustering in embedding space and measurable semantic divergence, supporting the view that even abstract identity cues can shift models' linguistic behavior.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Mental Health via Writing · Artificial Intelligence in Healthcare and Education
