MaMa: A Game-Theoretic Approach for Designing Safe Agentic Systems
Jonathan N\"other, Adish Singla, Goran Radanovic

TL;DR
This paper introduces MaMa, a game-theoretic framework using LLMs to automatically design multi-agent systems that remain safe against adversarial compromises, ensuring robustness and safety in diverse environments.
Contribution
The paper presents MaMa, a novel algorithm that formulates safe system design as a Stackelberg game and employs LLM-based adversarial search to enhance safety against worst-case attacks.
Findings
MaMa systems effectively defend against worst-case adversarial attacks.
Designed systems maintain high task performance while ensuring safety.
Robustness extends to different adversaries and attack objectives.
Abstract
LLM-based multi-agent systems have demonstrated impressive capabilities, but they also introduce significant safety risks when individual agents fail or behave adversarially. In this work, we study the automated design of agentic systems that remain safe even when a subset of agents is compromised. We formalize this challenge as a Stackelberg security game between a system designer (the Meta-Agent) and a best-responding Meta-Adversary that selects and compromises a subset of agents to minimize safety. We propose Meta-Adversary-Meta-Agent (MaMa), a novel algorithm for approximately solving this game and automatically designing safe agentic systems. Our approach uses LLM-based adversarial search, where the Meta-Agent iteratively proposes system designs and receives feedback based on the strongest attacks discovered by the Meta-Adversary. Empirical evaluations across diverse environments…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Smart Grid Security and Resilience · Infrastructure Resilience and Vulnerability Analysis
