MistExit: Learning to Exit for Early Mistake Detection in Procedural Videos
Sagnik Majumder, Anish Nethi, Ziad Al-Halah, Kristen Grauman

TL;DR
MistExit introduces an early mistake detection method in procedural videos that combines a mistake detector with reinforcement learning to accurately identify errors while observing minimal video data.
Contribution
The paper presents a novel approach integrating mistake detection and adaptive early exit policy for efficient and accurate procedural video analysis.
Findings
Outperforms state-of-the-art in mistake detection accuracy
Reduces video observation time significantly
Effective across diverse real-world datasets
Abstract
We introduce the task of early mistake detection in video, where the goal is to determine whether a keystep in a procedural activity is performed correctly while observing as little of the streaming video as possible. To tackle this problem, we propose a method comprising a mistake detector and a reinforcement learning policy. At each timestep, the detector processes recently observed frames to estimate the keystep's correctness while anticipating future visual features, enabling reliable early mistake estimates. Meanwhile, the policy aggregates the detector outputs and visual observations over time and adaptively decides when to exit (i.e., stop processing incoming frames) while producing the final prediction. Using diverse real-world procedural video datasets, we demonstrate that our MistExit model achieves superior mistake detection accuracy while reducing the fraction of video…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Video Analysis and Summarization · Visual Attention and Saliency Detection
