Enhancing Video-Based Robot Failure Detection Using Task Knowledge

Santosh Thoduka; Sebastian Houben; Juergen Gall; Paul G. Pl\"oger

arXiv:2508.18705·cs.RO·September 24, 2025

Enhancing Video-Based Robot Failure Detection Using Task Knowledge

Santosh Thoduka, Sebastian Houben, Juergen Gall, Paul G. Pl\"oger

PDF

TL;DR

This paper introduces a video-based failure detection method for robots that leverages spatio-temporal knowledge of actions and objects, demonstrating improved accuracy through data augmentation on real datasets.

Contribution

The paper proposes a novel failure detection approach using task-relevant spatio-temporal knowledge and introduces a data augmentation technique to enhance performance.

Findings

01

F1 score improved from 77.9 to 80.0 with the new method

02

Further increased to 81.4 with test-time augmentation

03

Effective use of spatio-temporal information for failure detection

Abstract

Robust robotic task execution hinges on the reliable detection of execution failures in order to trigger safe operation modes, recovery strategies, or task replanning. However, many failure detection methods struggle to provide meaningful performance when applied to a variety of real-world scenarios. In this paper, we propose a video-based failure detection approach that uses spatio-temporal knowledge in the form of the actions the robot performs and task-relevant objects within the field of view. Both pieces of information are available in most robotic scenarios and can thus be readily obtained. We demonstrate the effectiveness of our approach on three datasets that we amend, in part, with additional annotations of the aforementioned task-relevant knowledge. In light of the results, we also propose a data augmentation method that improves performance by applying variable frame rates to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.