Safe Multi-Agent Behavior Must Be Maintained, Not Merely Asserted: Constraint Drift in LLM-Based Multi-Agent Systems
Tianxiao Li, Yixing Ma, Haiquan Wen, Zhenglin Huang, Qianyu Zhou, Zeyu Fu, Guangliang Cheng

TL;DR
This paper emphasizes that safety constraints in LLM-based multi-agent systems must be actively maintained throughout their operation, not just asserted at the outset, to prevent constraint drift and ensure safety.
Contribution
It introduces the concept of constraint drift and proposes Constraint State Governance as a paradigm to maintain safety constraints during multi-agent system execution.
Findings
Safety-critical constraints can weaken or be lost during agent workflows.
Current safety measures are insufficient without active constraint maintenance.
Constraint State Governance can help preserve safety throughout agent trajectories.
Abstract
Modern LLM based agents are no longer passive text generators. They read repositories, call tools, browse the web, execute code, maintain memory, communicate with other agents, and act through long horizon workflows. This shift moves the unit of safety. A system may produce a compliant final answer while leaking private information through an internal message, delegating authority beyond its original scope, calling an external tool with sensitive context, or losing the evidence needed to reconstruct why an action was allowed. We argue that many emerging failures in LLM-based multi-agent systems share a common structure: safety critical constraints do not remain operative throughout the trajectory. We call this phenomenon constraint drift: the loss, distortion, weakening, or relaxation of constraints as they pass through memory, delegation, communication, tool use, audit, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
