Diagnose, Correct, and Learn from Manipulation Failures via Visual Symbols
Xianchao Zeng, Xinyu Zhou, Youcheng Li, Jiayou Shi, Tianle Li, Liangming Chen, Lei Ren, Yong-Lu Li

TL;DR
This paper introduces ViFailback, a comprehensive framework and dataset for diagnosing and correcting robotic manipulation failures using visual symbols, improving real-world failure recovery in vision-language models.
Contribution
The paper presents ViFailback, a novel framework with a large-scale dataset and benchmark for failure diagnosis and correction in robotic manipulation, enhancing real-world applicability and model performance.
Findings
ViFailback-Bench enables detailed evaluation of VLMs in failure scenarios.
ViFailback-8B significantly improves failure diagnosis and correction performance.
Integration with VLA models helps robots recover from manipulation failures.
Abstract
Vision-Language-Action (VLA) models have recently achieved remarkable progress in robotic manipulation, yet they remain limited in failure diagnosis and learning from failures. Additionally, existing failure datasets are mostly generated programmatically in simulation, which limits their generalization to the real world. In light of these, we introduce ViFailback, a framework designed to diagnose robotic manipulation failures and provide both textual and visual correction guidance. Our framework utilizes explicit visual symbols to enhance annotation efficiency. We further release the ViFailback dataset, a large-scale collection of 58,126 Visual Question Answering (VQA) pairs along with their corresponding 5,202 real-world manipulation trajectories. Based on the dataset, we establish ViFailback-Bench, a benchmark of 11 fine-grained VQA tasks designed to assess the failure diagnosis and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Robot Manipulation and Learning · Advanced Neural Network Applications
