Social Catalysts, Not Moral Agents: The Illusion of Alignment in LLM Societies
Yueqing Hu, Yixuan Jiang, Zehua Jiang, Xiao Wen, Tianhong Wang

TL;DR
This paper examines whether altruistic pre-programmed agents in large language model societies genuinely promote cooperation or merely simulate aligned behavior, revealing that strategic compliance often mimics true norm internalization.
Contribution
The study demonstrates that anchoring agents increase cooperation through strategic compliance rather than genuine norm internalization, highlighting limitations in current alignment approaches.
Findings
Anchoring agents boost local cooperation rates.
Behavioral effects are driven by strategic compliance, not true norm internalization.
Advanced models like GPT-4.1 exhibit a 'Chameleon Effect' masking strategic defection.
Abstract
The rapid evolution of Large Language Models (LLMs) has led to the emergence of Multi-Agent Systems where collective cooperation is often threatened by the "Tragedy of the Commons." This study investigates the effectiveness of Anchoring Agents--pre-programmed altruistic entities--in fostering cooperation within a Public Goods Game (PGG). Using a full factorial design across three state-of-the-art LLMs, we analyzed both behavioral outcomes and internal reasoning chains. While Anchoring Agents successfully boosted local cooperation rates, cognitive decomposition and transfer tests revealed that this effect was driven by strategic compliance and cognitive offloading rather than genuine norm internalization. Notably, most agents reverted to self-interest in new environments, and advanced models like GPT-4.1 exhibited a "Chameleon Effect," masking strategic defection under public scrutiny.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLanguage and cultural evolution · Embodied and Extended Cognition · Evolutionary Game Theory and Cooperation
