Unpacking Failure Modes of Generative Policies: Runtime Monitoring of Consistency and Progress
Christopher Agia, Rohan Sinha, Jingyun Yang, Zi-ang Cao, Rika, Antonova, Marco Pavone, Jeannette Bohg

TL;DR
Sentinel is a runtime monitoring framework for generative robot policies that combines statistical temporal consistency checks and Vision Language Models to detect diverse failure modes, improving detection accuracy in robotic manipulation tasks.
Contribution
The paper introduces Sentinel, a novel framework that unifies temporal consistency measures and VLM-based detection to identify failure modes in generative policies.
Findings
Sentinel detects 18% more failures than individual detectors.
Combining detectors significantly improves failure detection accuracy.
The approach outperforms existing baselines in robotic manipulation scenarios.
Abstract
Robot behavior policies trained via imitation learning are prone to failure under conditions that deviate from their training data. Thus, algorithms that monitor learned policies at test time and provide early warnings of failure are necessary to facilitate scalable deployment. We propose Sentinel, a runtime monitoring framework that splits the detection of failures into two complementary categories: 1) Erratic failures, which we detect using statistical measures of temporal action consistency, and 2) task progression failures, where we use Vision Language Models (VLMs) to detect when the policy confidently and consistently takes actions that do not solve the task. Our approach has two key strengths. First, because learned policies exhibit diverse failure modes, combining complementary detectors leads to significantly higher accuracy at failure detection. Second, using a statistical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplex Systems and Decision Making · Systems Engineering Methodologies and Applications · Scientific Computing and Data Management
