12 Angry AI Agents: Evaluating Multi-Agent LLM Decision-Making Through Cinematic Jury Deliberation

Ahmet Bahaddin Ersoz

arXiv:2605.01986·cs.AI·May 5, 2026

12 Angry AI Agents: Evaluating Multi-Agent LLM Decision-Making Through Cinematic Jury Deliberation

Ahmet Bahaddin Ersoz

PDF

TL;DR

This paper introduces a multi-agent LLM benchmark inspired by '12 Angry Men', analyzing how different models debate and reach verdicts, revealing that alignment influences deliberative flexibility more than capability.

Contribution

It presents a novel multi-agent debate framework with LLMs conditioned on personas, comparing the effects of alignment levels on deliberation dynamics and outcomes.

Findings

01

Most runs ended in hung juries, showing anchoring as a key failure mode.

02

GPT-4o and Llama-4-Scout exhibit different internal dynamics and verdicts.

03

Alignment intensity, not capability, primarily determines deliberative flexibility.

Abstract

What if the twelve jurors of Sidney Lumet's 12 Angry Men (1957) were not men, but large language models? Would the one juror who disagrees still be able to change everyone's mind? This paper instantiates that scenario as a multi-agent benchmark for LLM deliberation: twelve agents, each conditioned on a film-faithful persona, debate the film's murder case using multi-agent framework. Two models representing opposite ends of the RLHF spectrum are tested: GPT-4o (closed-source, heavy alignment) and Llama-4-Scout (open-weight, lighter alignment), across three conditions (baseline, open-minded prompt, no initial vote), with N = 3 replications per cell (18 runs total). Three findings emerge. (i) Seventeen of eighteen runs end in a hung jury (a state where the jury fails to reach a unanimous verdict); the film's central event, gradual minority-to-majority persuasion, almost never occurs,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.