MAEBE: Multi-Agent Emergent Behavior Framework

Sinem Erisken (Independent Researcher); Timothy Gothard (Independent Researcher); Martin Leitgab (Independent Researcher); Ram Potham (Independent Researcher)

arXiv:2506.03053·cs.MA·July 11, 2025

MAEBE: Multi-Agent Emergent Behavior Framework

Sinem Erisken (Independent Researcher), Timothy Gothard (Independent Researcher), Martin Leitgab (Independent Researcher), Ram Potham (Independent Researcher)

PDF

Open Access 1 Models

TL;DR

This paper presents MAEBE, a framework for evaluating emergent risks in multi-agent AI systems, revealing that ensemble behaviors and moral preferences are unpredictable and influenced by group dynamics, posing new safety challenges.

Contribution

Introduction of MAEBE, a systematic framework for assessing emergent risks in multi-agent AI systems, with novel techniques and insights into moral and group behaviors.

Findings

01

LLM moral preferences are brittle and vary with question framing.

02

Ensemble moral reasoning differs from isolated agents due to emergent dynamics.

03

Peer pressure influences ensemble convergence despite supervision.

Abstract

Traditional AI safety evaluations on isolated LLMs are insufficient as multi-agent AI ensembles become prevalent, introducing novel emergent risks. This paper introduces the Multi-Agent Emergent Behavior Evaluation (MAEBE) framework to systematically assess such risks. Using MAEBE with the Greatest Good Benchmark (and a novel double-inversion question technique), we demonstrate that: (1) LLM moral preferences, particularly for Instrumental Harm, are surprisingly brittle and shift significantly with question framing, both in single agents and ensembles. (2) The moral reasoning of LLM ensembles is not directly predictable from isolated agent behavior due to emergent group dynamics. (3) Specifically, ensembles exhibit phenomena like peer pressure influencing convergence, even when guided by a supervisor, highlighting distinct safety and alignment challenges. Our findings underscore the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
nightmedia/Qwen3.5-9B-Claude-4.6-Opus-Deckard-V4.2-Uncensored-Heretic-Thinking-qx86-hi-mlx
model· 897 dl· ♡ 3
897 dl♡ 3

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComplex Systems and Decision Making