TL;DR
PrefixGuard is a framework for creating online failure-warning monitors for LLM agents, using trace analysis and supervised learning to improve early detection of failures in tool-using tasks.
Contribution
It introduces a novel trace-to-monitor framework with an offline induction step and supervised training, improving early failure detection over raw text controls.
Findings
PrefixGuard monitors achieve high AUPRC scores across multiple benchmarks.
Monitors outperform raw-text controls by an average of +0.137 AUPRC.
Finite-state automata remain compact on some benchmarks but expand on others.
Abstract
Large language model (LLM) agents now execute long, tool-using tasks where final outcome checks can arrive too late for intervention. Online warning requires lightweight prefix monitors over heterogeneous traces, but hand-authored event schemas are brittle and deployment-time LLM judging is costly. We introduce PrefixGuard, a trace-to-monitor framework with an offline StepView induction step followed by supervised monitor training. StepView induces deterministic typed-step adapters from raw trace samples, and the monitor learns an event abstraction and prefix-risk scorer from terminal outcomes. Across WebArena, -Bench, SkillsBench, and TerminalBench, the strongest PrefixGuard monitors reach 0.900/0.710/0.533/0.557 AUPRC. Using the strongest backend within each representation, they improve over raw-text controls by an average of +0.137 AUPRC. LLM judges remain substantially…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
