ProbGuard: Probabilistic Runtime Monitoring for LLM Agent Safety

Haoyu Wang; Christopher M. Poskitt; Jiali Wei; Jun Sun

arXiv:2508.00500·cs.AI·March 30, 2026

ProbGuard: Probabilistic Runtime Monitoring for LLM Agent Safety

Haoyu Wang, Christopher M. Poskitt, Jiali Wei, Jun Sun

PDF

1 Repo

TL;DR

ProbGuard is a proactive runtime monitoring framework for LLM agents that predicts safety violations in advance using probabilistic models, enhancing safety in domains like autonomous driving and household robotics.

Contribution

It introduces a novel probabilistic risk prediction approach using DTMCs, with semantic constraints and PAC guarantees, to anticipate and prevent unsafe behaviors in LLM agents.

Findings

01

Predicts traffic violations up to 38.66 seconds in advance.

02

Reduces unsafe behavior by up to 65.37%.

03

Maintains up to 80.4% task completion.

Abstract

Large Language Model (LLM) agents increasingly operate across domains such as robotics, virtual assistants, and web automation. However, their stochastic decision-making introduces safety risks that are difficult to anticipate during execution. Existing runtime monitoring frameworks, such as AgentSpec, primarily rely on reactive safety rules that detect violations only when unsafe behavior is imminent or has already occurred, limiting their ability to handle long-horizon dependencies. We present ProbGuard, a proactive runtime monitoring framework for LLM agents that anticipates safety violations through probabilistic risk prediction. ProbGuard abstracts agent executions into symbolic states and learns a Discrete-Time Markov Chain (DTMC) from execution traces to model behavioral dynamics. At runtime, the monitor estimates the probability that future executions will reach unsafe states…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

null
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.