I Want to Break Free! Persuasion and Anti-Social Behavior of LLMs in Multi-Agent Settings with Social Hierarchy

Gian Maria Campedelli; Nicol\`o Penzo; Massimo Stefan; Roberto Dess\`i; Marco Guerini; Bruno Lepri; Jacopo Staiano

arXiv:2410.07109·cs.CL·November 5, 2025

I Want to Break Free! Persuasion and Anti-Social Behavior of LLMs in Multi-Agent Settings with Social Hierarchy

Gian Maria Campedelli, Nicol\`o Penzo, Massimo Stefan, Roberto Dess\`i, Marco Guerini, Bruno Lepri, Jacopo Staiano

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This study analyzes how large language model agents interact within a simulated social hierarchy, revealing factors influencing persuasion and anti-social behaviors, with implications for AI societal impact.

Contribution

It provides a comprehensive analysis of multi-agent LLM interactions in hierarchical settings, highlighting factors affecting persuasion and anti-social conduct, and identifying emergent behaviors without explicit prompts.

Findings

01

Model-specific conversational failures identified.

02

Goal setting affects persuasiveness but not anti-social behavior.

03

Anti-social conduct can emerge without explicit prompts.

Abstract

As LLM-based agents become increasingly autonomous and will more freely interact with each other, studying the interplay among them becomes crucial to anticipate emergent phenomena and potential risks. In this work, we provide an in-depth analysis of the interactions among agents within a simulated hierarchical social environment, drawing inspiration from the Stanford Prison Experiment. Leveraging 2,400 conversations across six LLMs (i.e., LLama3, Orca2, Command-r, Mixtral, Mistral2, and gpt4.1) and 240 experimental scenarios, we analyze persuasion and anti-social behavior between a guard and a prisoner agent with differing objectives. We first document model-specific conversational failures in this multi-agent power dynamic context, thereby narrowing our analytic sample to 1,600 conversations. Among models demonstrating successful interaction, we find that goal setting significantly…

Peer Reviews

Decision·ICLR 2025 Conference Withdrawn Submission

Reviewer 01Rating 3Confidence 4

Strengths

S1. The paper addresses a crucial research question regarding LLM agent interactions in hierarchical social environments. S2. The experimental design is comprehensive, comprising 2000 conversations across multiple scenarios, with clear and well-documented experimental protocols. S3. The evaluation is reliable. It employs multiple metrics for anti-social behavior assessment and maintains statistical rigor through appropriate methodological choices (e.g., Granger causality tests and OLS regressi

Weaknesses

W1. The study's scope is confined to open-source models, notably excluding major closed-source models such as GPT-4 and Claude-3. More critically, there is no systematic analysis of model scaling effects, which could provide valuable insights into how model architecture and size influence social behaviors and interactions. W2. Insufficient RLHF impact analysis: the paper lacks a thorough examination of how RLHF might affect the experimental outcomes. This is a significant oversight given that d

Reviewer 02Rating 3Confidence 4

Strengths

- The problem of understanding how LLM agents act in a role-playing setting is interesting and compelling.

Weaknesses

- It appears that the authors consider a very specific situation, from which they extract some very general claims without considering that we are dealing with a role-playing situation (which does not appear particularly problematic per se). It seems that the authors interpret the behavior of the agent as misbehavior, but, at the end of the day, it is just about the actual “invented” role plays of prisoners against guards. Given the context, forms of “anti-social behavior” are kind of expected i

Reviewer 03Rating 3Confidence 3

Strengths

- This work is bold and intriguing to me. - The authors have conducted a significant number of experiments.

Weaknesses

- It appears that the authors may have prior knowledge of the results from human experiments (SPE) and are aiming to replicate these outcomes with LLMs. A more unbiased approach would be to use a very basic prompt describing the scenario and let the LLMs simulate behavior from scratch. But it seems that **highly suggestive prompts** were used. For example: - Research Oversight: The agents are explicitly informed about SPE (Line 199), which may lead them to intentionally mimic behaviors observed

Code & Models

Repositories

mobs-fbk/llm_interaction_simulator
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsOpen Source Software Innovations · Auction Theory and Applications · Merger and Competition Analysis

MethodsSparse Evolutionary Training