The Consensus Trap: Rescuing Multi-Agent LLMs from Adversarial Majorities via Token-Level Collaboration

Jiayuan Liu; Shiyi Du; Weihua Du; Mingyu Guo; Vincent Conitzer

arXiv:2604.17139·cs.CL·April 21, 2026

The Consensus Trap: Rescuing Multi-Agent LLMs from Adversarial Majorities via Token-Level Collaboration

Jiayuan Liu, Shiyi Du, Weihua Du, Mingyu Guo, Vincent Conitzer

PDF

TL;DR

This paper introduces Token-Level Round-Robin Collaboration, a method that enhances multi-agent LLM robustness against adversarial majority attacks by interleaving token generation, supported by theoretical proofs and empirical results.

Contribution

It proposes a novel token-level interleaving approach that overcomes the limitations of response-level voting in multi-agent LLMs, backed by formal analysis and extensive experiments.

Findings

01

MAJ collapses under majority corruption in multi-agent LLMs

02

RR maintains high accuracy even with majority adversarial agents

03

Theoretical analysis shows token-level interleaving creates a non-linear logic chain

Abstract

Multi-agent large language model (LLM) architectures increasingly rely on response-level aggregation, such as Majority Voting (MAJ), to raise reasoning ceilings. However, in open environments, agents are highly susceptible to stealthy contextual corruption, such as targeted prompt injections. We reveal a critical structural vulnerability in current multi-agent systems: response-level aggregation collapses when corrupted agents form a local majority. Because voting aggregates fully-formed conclusions, it is blind to flawed intermediate logic. To overcome this systematic limitation, we propose the Token-Level Round-Robin (RR) Collaboration, where agents sequentially interleave generation within a shared auto-regressive context. We formalize this process as a discrete-time dynamical system, proving that token-level interleaving transitions aggregation from a brittle counting of final votes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.