MistExit: Learning to Exit for Early Mistake Detection in Procedural Videos

Sagnik Majumder; Anish Nethi; Ziad Al-Halah; Kristen Grauman

arXiv:2603.14252·cs.CV·March 17, 2026

MistExit: Learning to Exit for Early Mistake Detection in Procedural Videos

Sagnik Majumder, Anish Nethi, Ziad Al-Halah, Kristen Grauman

PDF

Open Access

TL;DR

MistExit introduces an early mistake detection method in procedural videos that combines a mistake detector with reinforcement learning to accurately identify errors while observing minimal video data.

Contribution

The paper presents a novel approach integrating mistake detection and adaptive early exit policy for efficient and accurate procedural video analysis.

Findings

01

Outperforms state-of-the-art in mistake detection accuracy

02

Reduces video observation time significantly

03

Effective across diverse real-world datasets

Abstract

We introduce the task of early mistake detection in video, where the goal is to determine whether a keystep in a procedural activity is performed correctly while observing as little of the streaming video as possible. To tackle this problem, we propose a method comprising a mistake detector and a reinforcement learning policy. At each timestep, the detector processes recently observed frames to estimate the keystep's correctness while anticipating future visual features, enabling reliable early mistake estimates. Meanwhile, the policy aggregates the detector outputs and visual observations over time and adaptively decides when to exit (i.e., stop processing incoming frames) while producing the final prediction. Using diverse real-world procedural video datasets, we demonstrate that our MistExit model achieves superior mistake detection accuracy while reducing the fraction of video…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Video Analysis and Summarization · Visual Attention and Saliency Detection