Scaling Cross-Environment Failure Reasoning Data for Vision-Language Robotic Manipulation
Paul Pacaud, Ricardo Garcia, Shizhe Chen, Cordelia Schmid

TL;DR
This paper introduces FailCoT, a large-scale dataset for robotic failure reasoning, and Guardian, a vision-language model that improves failure detection and task success in robotic manipulation by leveraging this data.
Contribution
The paper presents a novel automatic framework for generating diverse robotic failure data and a new model, Guardian, that achieves state-of-the-art results in failure reasoning and task success.
Findings
Guardian outperforms previous models on three real-world benchmarks.
Scaling failure reasoning data significantly improves generalization in robotic failure detection.
Integration with LLM-based policies enhances task success rates in real-world robot deployment.
Abstract
Robust robotic manipulation requires reliable failure detection and recovery. Although recent Vision-Language Models (VLMs) show promise in robot failure detection, their generalization is severely limited by the scarcity and narrow coverage of failure data. To address this bottleneck, we propose an automatic framework for generating diverse robotic planning and execution failures across both simulated and real-world environments. Our approach perturbs successful manipulation trajectories to synthesize failures that reflect realistic failure distributions, and leverages VLMs to produce structured step-by-step reasoning traces. This yields FailCoT, a large-scale failure reasoning dataset built upon the RLBench simulator and the BridgeDataV2 real-robot dataset. Using FailCoT, we train Guardian, a multi-view reasoning VLM for unified planning and execution verification. Guardian achieves…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
