I Want to Break Free! Persuasion and Anti-Social Behavior of LLMs in Multi-Agent Settings with Social Hierarchy
Gian Maria Campedelli, Nicol\`o Penzo, Massimo Stefan, Roberto Dess\`i, Marco Guerini, Bruno Lepri, Jacopo Staiano

TL;DR
This study analyzes how large language model agents interact within a simulated social hierarchy, revealing factors influencing persuasion and anti-social behaviors, with implications for AI societal impact.
Contribution
It provides a comprehensive analysis of multi-agent LLM interactions in hierarchical settings, highlighting factors affecting persuasion and anti-social conduct, and identifying emergent behaviors without explicit prompts.
Findings
Model-specific conversational failures identified.
Goal setting affects persuasiveness but not anti-social behavior.
Anti-social conduct can emerge without explicit prompts.
Abstract
As LLM-based agents become increasingly autonomous and will more freely interact with each other, studying the interplay among them becomes crucial to anticipate emergent phenomena and potential risks. In this work, we provide an in-depth analysis of the interactions among agents within a simulated hierarchical social environment, drawing inspiration from the Stanford Prison Experiment. Leveraging 2,400 conversations across six LLMs (i.e., LLama3, Orca2, Command-r, Mixtral, Mistral2, and gpt4.1) and 240 experimental scenarios, we analyze persuasion and anti-social behavior between a guard and a prisoner agent with differing objectives. We first document model-specific conversational failures in this multi-agent power dynamic context, thereby narrowing our analytic sample to 1,600 conversations. Among models demonstrating successful interaction, we find that goal setting significantly…
Peer Reviews
Decision·ICLR 2025 Conference Withdrawn Submission
S1. The paper addresses a crucial research question regarding LLM agent interactions in hierarchical social environments. S2. The experimental design is comprehensive, comprising 2000 conversations across multiple scenarios, with clear and well-documented experimental protocols. S3. The evaluation is reliable. It employs multiple metrics for anti-social behavior assessment and maintains statistical rigor through appropriate methodological choices (e.g., Granger causality tests and OLS regressi
W1. The study's scope is confined to open-source models, notably excluding major closed-source models such as GPT-4 and Claude-3. More critically, there is no systematic analysis of model scaling effects, which could provide valuable insights into how model architecture and size influence social behaviors and interactions. W2. Insufficient RLHF impact analysis: the paper lacks a thorough examination of how RLHF might affect the experimental outcomes. This is a significant oversight given that d
- The problem of understanding how LLM agents act in a role-playing setting is interesting and compelling.
- It appears that the authors consider a very specific situation, from which they extract some very general claims without considering that we are dealing with a role-playing situation (which does not appear particularly problematic per se). It seems that the authors interpret the behavior of the agent as misbehavior, but, at the end of the day, it is just about the actual “invented” role plays of prisoners against guards. Given the context, forms of “anti-social behavior” are kind of expected i
- This work is bold and intriguing to me. - The authors have conducted a significant number of experiments.
- It appears that the authors may have prior knowledge of the results from human experiments (SPE) and are aiming to replicate these outcomes with LLMs. A more unbiased approach would be to use a very basic prompt describing the scenario and let the LLMs simulate behavior from scratch. But it seems that **highly suggestive prompts** were used. For example: - Research Oversight: The agents are explicitly informed about SPE (Line 199), which may lead them to intentionally mimic behaviors observed
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOpen Source Software Innovations · Auction Theory and Applications · Merger and Competition Analysis
MethodsSparse Evolutionary Training
