Technical Report for Egocentric Mistake Detection for the HoloAssist Challenge
Constantin Patsch, Marsil Zakour, Yuankai Wu, Eckehard Steinbach

TL;DR
This paper presents an online mistake detection framework for egocentric video analysis that identifies various error types in real-time and uses LLMs to generate explanations, demonstrating effectiveness on the HoloAssist benchmark.
Contribution
We introduce a novel online mistake detection system capable of identifying both procedural and execution errors in egocentric videos, enhanced with LLM-based explanatory feedback.
Findings
Achieved second place on the HoloAssist mistake detection benchmark.
Effectively detects a broad range of errors in real-time.
Utilizes LLMs for generating human-readable error explanations.
Abstract
In this report, we address the task of online mistake detection, which is vital in domains like industrial automation and education, where real-time video analysis allows human operators to correct errors as they occur. While previous work focuses on procedural errors involving action order, broader error types must be addressed for real-world use. We introduce an online mistake detection framework that handles both procedural and execution errors (e.g., motor slips or tool misuse). Upon detecting an error, we use a large language model (LLM) to generate explanatory feedback. Experiments on the HoloAssist benchmark confirm the effectiveness of our approach, where our approach is placed second on the mistake detection task.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Multimodal Machine Learning Applications · Human Pose and Action Recognition
