Transparent and Coherent Procedural Mistake Detection

Shane Storks; Itamar Bar-Yossef; Yayuan Li; Zheyuan Zhang; Jason J. Corso; Joyce Chai

arXiv:2412.11927·cs.AI·December 11, 2025

Transparent and Coherent Procedural Mistake Detection

Shane Storks, Itamar Bar-Yossef, Yayuan Li, Zheyuan Zhang, Jason J. Corso, Joyce Chai

PDF

Open Access 1 Video

TL;DR

This paper introduces a transparent approach to procedural mistake detection using visual self-dialog rationales and benchmarks the performance of vision-language models, highlighting current limitations and avenues for enhancement.

Contribution

It reformulates procedural mistake detection to include visual rationales and develops automated coherence metrics, providing new insights into model transparency and performance.

Findings

01

VLMs struggle with off-the-shelf PMD tasks

02

Incorporating coherence metrics improves accuracy and efficiency

03

Visual rationales enhance transparency in mistake detection

Abstract

Procedural mistake detection (PMD) is a challenging problem of classifying whether a human user (observed through egocentric video) has successfully executed a task (specified by a procedural text). Despite significant recent efforts, machine performance in the wild remains nonviable, and the reasoning processes underlying this performance are opaque. As such, we extend PMD to require generating visual self-dialog rationales to inform decisions. Given the impressive, mature image understanding capabilities observed in recent vision-and-language models (VLMs), we curate a suitable benchmark dataset for PMD based on individual frames. As our reformulation enables unprecedented transparency, we leverage a natural language inference (NLI) model to formulate two automated metrics for the coherence of generated rationales. We establish baselines for this reframed task, showing that VLMs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Transparent and Coherent Procedural Mistake Detection· underline

Taxonomy

TopicsSoftware Engineering Research · Digital and Cyber Forensics

MethodsSoftmax · Attention Is All You Need