Soft-Label Governance for Distributional Safety in Multi-Agent Systems

Aizierjiang Aiersilan; Raeli Savitt

arXiv:2604.19752·cs.MA·April 23, 2026

Soft-Label Governance for Distributional Safety in Multi-Agent Systems

Aizierjiang Aiersilan, Raeli Savitt

PDF

2 Repos

TL;DR

SWARM introduces a simulation framework using soft probabilistic labels for continuous risk assessment and governance in multi-agent systems, revealing safety-welfare tradeoffs and the importance of calibrated interventions.

Contribution

It presents a novel soft-label approach and modular governance engine for distributional safety, with empirical analysis across multiple scenarios and real-world agent applications.

Findings

01

Strict governance can reduce welfare by over 40% without safety gains.

02

Aggressive internalization of externalities collapses welfare from +262 to -67, toxicity unchanged.

03

Careful calibration of circuit breakers balances welfare and toxicity.

Abstract

Multi-agent AI systems exhibit emergent risks that no single agent produces in isolation. Existing safety frameworks rely on binary classifications of agent behavior, discarding the uncertainty inherent in proxy-based evaluation. We introduce SWARM (\textbf{S}ystem-\textbf{W}ide \textbf{A}ssessment of \textbf{R}isk in \textbf{M}ulti-agent systems), a simulation framework that replaces binary good/bad labels with \emph{soft probabilistic labels} $p = P (v = + 1) \in [0, 1]$ , enabling continuous-valued payoff computation, toxicity measurement, and governance intervention. SWARM implements a modular governance engine with configurable levers (transaction taxes, circuit breakers, reputation decay, and random audits) and quantifies their effects through probabilistic metrics including expected toxicity $E [1 - p ∣ accepted]$ and quality gap $\mathbb{E}[p \mid \text{accepted}]…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.