Status Hierarchies in Language Models
Emilio Barkett

TL;DR
This paper explores how language models develop and exhibit status hierarchies based on perceived competence, revealing implications for AI safety and social behavior in multi-agent systems.
Contribution
It adapts a social psychology framework to analyze status formation in language models, demonstrating how they respond to status cues and capability differences.
Findings
Models form significant status hierarchies when capabilities are equal.
Capability differences overshadow status cues in influence.
High-status assignments can reduce deference from higher-capability models.
Abstract
From school playgrounds to corporate boardrooms, status hierarchies -- rank orderings based on respect and perceived competence -- are universal features of human social organization. Language models trained on human-generated text inevitably encounter these hierarchical patterns embedded in language, raising the question of whether they might reproduce such dynamics in multi-agent settings. This thesis investigates when and how language models form status hierarchies by adapting Berger et al.'s (1972) expectation states framework. I create multi-agent scenarios where separate language model instances complete sentiment classification tasks, are introduced with varying status characteristics (e.g., credentials, expertise), then have opportunities to revise their initial judgments after observing their partner's responses. The dependent variable is deference, the rate at which models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEthics and Social Impacts of AI · Explainable Artificial Intelligence (XAI) · Social Power and Status Dynamics
