Mistake Attribution: Fine-Grained Mistake Understanding in Egocentric Videos

Yayuan Li; Aadit Jain; Filippos Bellos; Jason J. Corso

arXiv:2511.20525·cs.CV·March 27, 2026

Mistake Attribution: Fine-Grained Mistake Understanding in Egocentric Videos

Yayuan Li, Aadit Jain, Filippos Bellos, Jason J. Corso

PDF

Open Access 1 Datasets

TL;DR

This paper introduces Mistake Attribution (MATT), a comprehensive framework for understanding human mistakes in egocentric videos, including new datasets and a unified model that outperforms existing methods across multiple tasks.

Contribution

The paper presents MATT, a novel task for detailed mistake understanding, along with MisEngine for dataset creation and MisFormer, a unified model that surpasses state-of-the-art performance.

Findings

01

MisEngine creates large, attribution-rich mistake datasets from existing data.

02

MisFormer outperforms task-specific SOTA methods in multiple mistake understanding tasks.

03

EPIC-KITCHENS-M and Ego4D-M are reliable benchmarks for mistake analysis.

Abstract

We introduce Mistake Attribution (MATT), a new task for fine-grained understanding of human mistakes in egocentric videos. While prior work detects whether a mistake occurs, MATT attributes the mistake to what part of the instruction is violated (semantic role), when in the video the deviation becomes irreversible (the Point-of-No-Return, PNR), and where the mistake appears in the PNR frame. We develop MisEngine, a data engine that automatically constructs mistake samples from existing datasets with attribution-rich annotations. Applied to large egocentric corpora, MisEngine yields EPIC-KITCHENS-M and Ego4D-M -- two datasets up to two orders of magnitude larger than prior mistake datasets. We then present MisFormer, a unified attention-based model for mistake attribution across semantic, temporal, and spatial dimensions, trained with MisEngine supervision. A human study demonstrates the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

mistakeattribution/MATT-Bench
dataset· 21 dl
21 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning