TL;DR
SWARM introduces a simulation framework using soft probabilistic labels for continuous risk assessment and governance in multi-agent systems, revealing safety-welfare tradeoffs and the importance of calibrated interventions.
Contribution
It presents a novel soft-label approach and modular governance engine for distributional safety, with empirical analysis across multiple scenarios and real-world agent applications.
Findings
Strict governance can reduce welfare by over 40% without safety gains.
Aggressive internalization of externalities collapses welfare from +262 to -67, toxicity unchanged.
Careful calibration of circuit breakers balances welfare and toxicity.
Abstract
Multi-agent AI systems exhibit emergent risks that no single agent produces in isolation. Existing safety frameworks rely on binary classifications of agent behavior, discarding the uncertainty inherent in proxy-based evaluation. We introduce SWARM (\textbf{S}ystem-\textbf{W}ide \textbf{A}ssessment of \textbf{R}isk in \textbf{M}ulti-agent systems), a simulation framework that replaces binary good/bad labels with \emph{soft probabilistic labels} , enabling continuous-valued payoff computation, toxicity measurement, and governance intervention. SWARM implements a modular governance engine with configurable levers (transaction taxes, circuit breakers, reputation decay, and random audits) and quantifies their effects through probabilistic metrics including expected toxicity and quality gap $\mathbb{E}[p \mid \text{accepted}]…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
