QuadSentinel: Sequent Safety for Machine-Checkable Control in Multi-agent Systems

Yiliu Yang; Yilei Jiang; Qunzhong Wang; Yingshui Tan; Xiaoyong Zhu; Sherman S.M. Chow; Bo Zheng; Xiangyu Yue

arXiv:2512.16279·cs.AI·December 19, 2025

QuadSentinel: Sequent Safety for Machine-Checkable Control in Multi-agent Systems

Yiliu Yang, Yilei Jiang, Qunzhong Wang, Yingshui Tan, Xiaoyong Zhu, Sherman S.M. Chow, Bo Zheng, Xiangyu Yue

PDF

Open Access 3 Reviews

TL;DR

QuadSentinel introduces a four-agent system that converts natural language safety policies into machine-checkable rules, improving safety enforcement in multi-agent systems with low costs and high accuracy.

Contribution

The paper presents QuadSentinel, a novel four-agent framework that translates ambiguous natural language policies into reliable, machine-checkable safety rules for multi-agent systems.

Findings

01

Improves guardrail accuracy and rule recall.

02

Reduces false positives in safety enforcement.

03

Outperforms single-agent baselines like ShieldAgent.

Abstract

Safety risks arise as large language model-based agents solve complex tasks with tools, multi-step plans, and inter-agent messages. However, deployer-written policies in natural language are ambiguous and context dependent, so they map poorly to machine-checkable rules, and runtime enforcement is unreliable. Expressing safety policies as sequents, we propose \textsc{QuadSentinel}, a four-agent guard (state tracker, policy verifier, threat watcher, and referee) that compiles these policies into machine-checkable rules built from predicates over observable state and enforces them online. Referee logic plus an efficient top- $k$ predicate updater keeps costs low by prioritizing checks and resolving conflicts hierarchically. Measured on ST-WebAgentBench (ICML CUA~'25) and AgentHarm (ICLR~'25), \textsc{QuadSentinel} improves guardrail accuracy and rule recall while reducing false positives.…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 2Confidence 3

Strengths

The guardrail of agent system is an interesting research direction. The paper develops a SOTA result solution.

Weaknesses

1. Novelty is limited. The paper also does not clarify the fundamental difference to ShieldAgent well. ShieldAgent uses probablistic inference to compute sort of "rule violation" score, while this paper uses LLM in the loop to track related rules, observe violations and make guardrail predictions. If I understand correctly, this is a fuzzy version of ShieldAgent with LLM in the loop. The state tracker part may make it more efficient to only consider related rules with LLM as filter, but the eff

Reviewer 02Rating 6Confidence 3

Strengths

- This paper addresses a challenging and important problem: safety in multi-agent LLM systems. - This paper proposes a relatively complete and thorough framework, capable of dynamic online monitoring with action- and trajectory-level safety enforcement for multi-agent systems. It also considers efficiency perspective, e.g., incremental updating of predicates to avoid full re-evaluation, and risk-cost optimization. - The translation from natural language policies to predicates and action and me

Weaknesses

- While QuadSentinel translates natural-language policies into sequents, it is unclear how robust the system is to ambiguous or conflicting policies. It is better to include such clarification or discussion in the paper. - The framework may fail for risky actions or malicious attacks that are unseen from the registered policy book. It would be beneficial for the paper to discuss potential adaptivity or mitigation strategies for unseen threats. - It would be better to show some case studies. -

Reviewer 03Rating 4Confidence 4

Strengths

1. The paper proposes a novel multi-agent guard framework that translates natural-language safety policies into propositional logics and enforces them through coordinated agents, extending prior single-agent guardrails e.g. ShieldAgent, and offering a clear conceptual contribution to runtime safety in multi-agent systems. 2. The proposed method achieves good performance on the evaluated benchmarks, where it consistently outperforms baselines on multiple safety benchmarks, and the ablation study

Weaknesses

1. The framework relies heavily on the correctness of the offline policy-to-rule translation step, but the paper provides limited quantitative evaluation of translation fidelity or failure cases, leaving uncertainty about robustness when policies are ambiguous or domain-shifted. 2. Although the method is positioned as low-overhead, the reported runtime analysis in the appendix is theoretical, and no actual latency, throughput, or cost measurements are provided for real deployments, especially u

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Multi-Agent Systems and Negotiation · Formal Methods in Verification