QuadSentinel: Sequent Safety for Machine-Checkable Control in Multi-agent Systems
Yiliu Yang, Yilei Jiang, Qunzhong Wang, Yingshui Tan, Xiaoyong Zhu, Sherman S.M. Chow, Bo Zheng, Xiangyu Yue

TL;DR
QuadSentinel introduces a four-agent system that converts natural language safety policies into machine-checkable rules, improving safety enforcement in multi-agent systems with low costs and high accuracy.
Contribution
The paper presents QuadSentinel, a novel four-agent framework that translates ambiguous natural language policies into reliable, machine-checkable safety rules for multi-agent systems.
Findings
Improves guardrail accuracy and rule recall.
Reduces false positives in safety enforcement.
Outperforms single-agent baselines like ShieldAgent.
Abstract
Safety risks arise as large language model-based agents solve complex tasks with tools, multi-step plans, and inter-agent messages. However, deployer-written policies in natural language are ambiguous and context dependent, so they map poorly to machine-checkable rules, and runtime enforcement is unreliable. Expressing safety policies as sequents, we propose \textsc{QuadSentinel}, a four-agent guard (state tracker, policy verifier, threat watcher, and referee) that compiles these policies into machine-checkable rules built from predicates over observable state and enforces them online. Referee logic plus an efficient top- predicate updater keeps costs low by prioritizing checks and resolving conflicts hierarchically. Measured on ST-WebAgentBench (ICML CUA~'25) and AgentHarm (ICLR~'25), \textsc{QuadSentinel} improves guardrail accuracy and rule recall while reducing false positives.…
Peer Reviews
Decision·Submitted to ICLR 2026
The guardrail of agent system is an interesting research direction. The paper develops a SOTA result solution.
1. Novelty is limited. The paper also does not clarify the fundamental difference to ShieldAgent well. ShieldAgent uses probablistic inference to compute sort of "rule violation" score, while this paper uses LLM in the loop to track related rules, observe violations and make guardrail predictions. If I understand correctly, this is a fuzzy version of ShieldAgent with LLM in the loop. The state tracker part may make it more efficient to only consider related rules with LLM as filter, but the eff
- This paper addresses a challenging and important problem: safety in multi-agent LLM systems. - This paper proposes a relatively complete and thorough framework, capable of dynamic online monitoring with action- and trajectory-level safety enforcement for multi-agent systems. It also considers efficiency perspective, e.g., incremental updating of predicates to avoid full re-evaluation, and risk-cost optimization. - The translation from natural language policies to predicates and action and me
- While QuadSentinel translates natural-language policies into sequents, it is unclear how robust the system is to ambiguous or conflicting policies. It is better to include such clarification or discussion in the paper. - The framework may fail for risky actions or malicious attacks that are unseen from the registered policy book. It would be beneficial for the paper to discuss potential adaptivity or mitigation strategies for unseen threats. - It would be better to show some case studies. -
1. The paper proposes a novel multi-agent guard framework that translates natural-language safety policies into propositional logics and enforces them through coordinated agents, extending prior single-agent guardrails e.g. ShieldAgent, and offering a clear conceptual contribution to runtime safety in multi-agent systems. 2. The proposed method achieves good performance on the evaluated benchmarks, where it consistently outperforms baselines on multiple safety benchmarks, and the ablation study
1. The framework relies heavily on the correctness of the offline policy-to-rule translation step, but the paper provides limited quantitative evaluation of translation fidelity or failure cases, leaving uncertainty about robustness when policies are ambiguous or domain-shifted. 2. Although the method is positioned as low-overhead, the reported runtime analysis in the appendix is theoretical, and no actual latency, throughput, or cost measurements are provided for real deployments, especially u
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Multi-Agent Systems and Negotiation · Formal Methods in Verification
