Unsupervised Discovery of Failure Taxonomies from Deployment Logs
Aryaman Gupta, Yusuf Umut Ciftci, Somil Bansal

TL;DR
This paper presents an unsupervised method to automatically discover meaningful failure categories from large-scale robotic failure logs, aiding system debugging and robustness improvement.
Contribution
It introduces a novel approach combining multimodal reasoning and semantic clustering to identify recurring failure modes without manual labeling.
Findings
Discovered interpretable failure taxonomies across multiple robotic domains
Structured failure modes improve targeted data collection and policy refinement
Method outperforms baseline clustering approaches in coherence and usefulness
Abstract
As robotic systems become increasingly integrated into real-world environments, ranging from autonomous vehicles to household assistants, they inevitably encounter diverse and unstructured scenarios that lead to failures. While such failures pose safety and reliability challenges, they also provide rich perceptual data for improving system robustness. However, manually analyzing large-scale failure datasets is impractical and does not scale. In this work, we introduce the problem of unsupervised discovery of failure taxonomies from large volumes of raw failure logs, aiming to obtain semantically coherent and actionable failure modes directly from perceptual trajectories. Our approach first infers structured failure explanations from multimodal inputs using vision-language reasoning, and then performs clustering in the resulting semantic reasoning space, enabling the discovery of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Adversarial Robustness in Machine Learning · Ethics and Social Impacts of AI
