The Consensus Trap: Rescuing Multi-Agent LLMs from Adversarial Majorities via Token-Level Collaboration
Jiayuan Liu, Shiyi Du, Weihua Du, Mingyu Guo, Vincent Conitzer

TL;DR
This paper introduces Token-Level Round-Robin Collaboration, a method that enhances multi-agent LLM robustness against adversarial majority attacks by interleaving token generation, supported by theoretical proofs and empirical results.
Contribution
It proposes a novel token-level interleaving approach that overcomes the limitations of response-level voting in multi-agent LLMs, backed by formal analysis and extensive experiments.
Findings
MAJ collapses under majority corruption in multi-agent LLMs
RR maintains high accuracy even with majority adversarial agents
Theoretical analysis shows token-level interleaving creates a non-linear logic chain
Abstract
Multi-agent large language model (LLM) architectures increasingly rely on response-level aggregation, such as Majority Voting (MAJ), to raise reasoning ceilings. However, in open environments, agents are highly susceptible to stealthy contextual corruption, such as targeted prompt injections. We reveal a critical structural vulnerability in current multi-agent systems: response-level aggregation collapses when corrupted agents form a local majority. Because voting aggregates fully-formed conclusions, it is blind to flawed intermediate logic. To overcome this systematic limitation, we propose the Token-Level Round-Robin (RR) Collaboration, where agents sequentially interleave generation within a shared auto-regressive context. We formalize this process as a discrete-time dynamical system, proving that token-level interleaving transitions aggregation from a brittle counting of final votes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
