TAMAS: Benchmarking Adversarial Risks in Multi-Agent LLM Systems

Ishan Kavathekar; Hemang Jain; Ameya Rathod; Ponnurangam Kumaraguru; Tanuja Ganu

arXiv:2511.05269·cs.MA·November 10, 2025

TAMAS: Benchmarking Adversarial Risks in Multi-Agent LLM Systems

Ishan Kavathekar, Hemang Jain, Ameya Rathod, Ponnurangam Kumaraguru, Tanuja Ganu

PDF

Open Access 4 Reviews

TL;DR

TAMAS is a comprehensive benchmark designed to evaluate the robustness and safety of multi-agent LLM systems against adversarial threats, revealing significant vulnerabilities and guiding future defenses.

Contribution

This paper introduces TAMAS, the first benchmark specifically targeting adversarial risks in multi-agent LLM systems, including diverse scenarios, attack types, and a new robustness score.

Findings

01

Multi-agent systems are highly vulnerable to adversarial attacks.

02

Current frameworks show critical failure modes under adversarial conditions.

03

TAMAS provides a systematic way to study and improve multi-agent LLM safety.

Abstract

Large Language Models (LLMs) have demonstrated strong capabilities as autonomous agents through tool use, planning, and decision-making abilities, leading to their widespread adoption across diverse tasks. As task complexity grows, multi-agent LLM systems are increasingly used to solve problems collaboratively. However, safety and security of these systems remains largely under-explored. Existing benchmarks and datasets predominantly focus on single-agent settings, failing to capture the unique vulnerabilities of multi-agent dynamics and co-ordination. To address this gap, we introduce $T$ hreats and $A$ ttacks in $M$ ulti- $A$ gent $S$ ystems ( $TAMAS$ ), a benchmark designed to evaluate the robustness and safety of multi-agent LLM systems. TAMAS includes five distinct scenarios comprising 300 adversarial instances across six attack types…

Peer Reviews

Decision·ICLR 2026 Conference Withdrawn Submission

Reviewer 01Rating 2Confidence 3

Strengths

- The idea is interesting. - Its studied problem of adversarial vulerabilities in multi-agent systems is interesting.

Weaknesses

- The paper's claims to originality are overstated, it fails to properly clarift what is fundamentally new about their evaluation of adversarial attacks compared against the attacks under the LLM or single agent context. - The tasks and tools are partly synthetic, so it is unclear how well results transfer to real systems with live APIs and true side effects. - The quality of the benchmark execution is also questionable; the dataset of 300 adversarial instances seems small for the scope of the

Reviewer 02Rating 6Confidence 3

Strengths

1, The topic of assessing multi-agent system safety is timely and important. 2, The benchmark includes multiple tasks, multiple constructed prompts, and the corresponding metric. And the evaluation includes multiple agentic structures.

Weaknesses

1, Lack of comparison with other agent safey benchmarks [1,2,3], what's the difference and main contribution compared to these benchmarks? [1] AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents. [2] Agent-SafetyBench: Evaluating the Safety of LLM Agents. [3] Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents 2, Lack of attack scenarios. For example, the MAS jailbreak [4], malicious coding behavior [5], and agent conversation behavior

Reviewer 03Rating 6Confidence 4

Strengths

1. Originality and Significance: to the best of my knowledge, this is the first benchmark to evaluate the safety and robustness of multi-agent LLM systems, especially for >= 3 agents. Also, some attacks specified for MAS, such as Byzantine, Colluding, and Contradicting are also tested. The topic is also an important topic that the community would be interested in, as it addresses under-explored and systemic risks of MAS. 2. The benchmark provides extensive evaluation, spanning five domains, 3

Weaknesses

Weakness 1: Limited Scope of Adversarial Goals (Disruption vs. Misuse) A limitation of the benchmark is its focus on attacks that disrupt a given task (e.g., Byzantine, Contradicting agents) or manipulate the immediate output (e.g., prompt injection), rather than testing for more severe, exploitative misuse. The safety community is increasingly concerned with threat actors instrumentalizing systems for inherently harmful, multi-step goals. The current benchmark does not appear to evaluate scena

Reviewer 04Rating 4Confidence 2

Strengths

1. The paper introduces TAMAS, the first benchmark to systematically evaluate the safety of multi-agent LLM systems. Its key innovation is defining and testing "multi-agent-specific risks" (like Byzantine, Colluding, and Contradicting agents) , which "have no analog in single-agent setups". 2. The work is methodologically rigorous. The TAMAS benchmark is comprehensive, spanning 300 adversarial instances across five domains and six attack types. The evaluation is thorough, testing 10 LLM backbon

Weaknesses

The core weakness is the paper's focus on demonstrating failure without providing actionable steps for mitigation or a deep root cause analysis. * Missing Defenses: The paper does not test the effectiveness of simple, common defenses, like providing agents with explicit refusal instructions (safety guardrails) in their prompts, which limits its practical use. * Shallow Analysis: It needs a deeper root cause analysis to distinguish whether failures are due to: Model-Level Compliance (LLM ignori

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Ethics and Social Impacts of AI · Explainable Artificial Intelligence (XAI)