Foundational World Models Accurately Detect Bimanual Manipulator Failures
Isaac R. Ward, Michelle Ho, Houjun Liu, Aaron Feldman, Joseph Vincent, Liam Kruse, Sean Cheong, Duncan Eddy, Mykel J. Kochenderfer, Mac Schwager

TL;DR
This paper introduces a probabilistic world model using a pretrained vision foundation model to detect failures in bimanual manipulators, significantly improving anomaly detection accuracy with fewer parameters.
Contribution
It develops a novel, uncertainty-based failure detection method leveraging a compressed latent space and conformal prediction, and introduces a new dataset for bimanual manipulation failure analysis.
Findings
Outperforms baseline anomaly detection methods in failure detection rate.
Uses approximately 1/20th of the parameters of comparable models.
Achieves 3.8% higher failure detection accuracy than the next-best approach.
Abstract
Deploying visuomotor robots at scale is challenging due to the potential for anomalous failures to degrade performance, cause damage, or endanger human life. Bimanual manipulators are no exception; these robots have vast state spaces comprised of high-dimensional images and proprioceptive signals. Explicitly defining failure modes within such state spaces is infeasible. In this work, we overcome these challenges by training a probabilistic, history informed, world model within the compressed latent space of a pretrained vision foundation model (NVIDIA's Cosmos Tokenizer). The model outputs uncertainty estimates alongside its predictions that serve as non-conformity scores within a conformal prediction framework. We use these scores to develop a runtime monitor, correlating periods of high uncertainty with anomalous failures. To test these methods, we use the simulated Push-T environment…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Robotics and Sensor-Based Localization · Robot Manipulation and Learning
