TL;DR
GAMBIT is a comprehensive benchmark for evaluating adversarial robustness in multi-agent LLM systems, emphasizing adaptive attacks and defenses through a multi-mode evaluation framework and a large, co-evolved dataset.
Contribution
It introduces a novel multi-mode benchmark with a dataset and an adaptive imposter agent, enabling realistic evaluation of detectors against evolving adversaries.
Findings
Zero-shot detection can be misleading for adaptive adversaries.
Meta-learned detectors adapt 20x faster than non-meta counterparts.
The benchmark reveals significant performance gaps in adaptive attack detection.
Abstract
In multi-agent systems (MAS), a single deceptive agent can nullify all gains of an agentic AI collective and evade deployed defenses. However, existing adversarial studies on MAS target only shallow tasks and do not consider adaptive adversaries, which evolve their strategies to evade the very detectors trained to catch them. To address that gap, we introduce GAMBIT, a benchmark with three evaluation modes and two independent scores for evaluating imposter detectors: the first two modes measure zero-shot detection under increasing distribution shift, and a third recalibration mode measures how quickly a detector adapts to novel attacks from just 20 labeled examples. The benchmark comes with a dataset of 27,804 labeled instances spanning 240 co-evolved imposter strategies. Our contributions are threefold: (1) Using chess as a substrate deep reasoning problem and Gemini 3.1 Pro for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
