Mistake Attribution: Fine-Grained Mistake Understanding in Egocentric Videos
Yayuan Li, Aadit Jain, Filippos Bellos, Jason J. Corso

TL;DR
This paper introduces Mistake Attribution (MATT), a comprehensive framework for understanding human mistakes in egocentric videos, including new datasets and a unified model that outperforms existing methods across multiple tasks.
Contribution
The paper presents MATT, a novel task for detailed mistake understanding, along with MisEngine for dataset creation and MisFormer, a unified model that surpasses state-of-the-art performance.
Findings
MisEngine creates large, attribution-rich mistake datasets from existing data.
MisFormer outperforms task-specific SOTA methods in multiple mistake understanding tasks.
EPIC-KITCHENS-M and Ego4D-M are reliable benchmarks for mistake analysis.
Abstract
We introduce Mistake Attribution (MATT), a new task for fine-grained understanding of human mistakes in egocentric videos. While prior work detects whether a mistake occurs, MATT attributes the mistake to what part of the instruction is violated (semantic role), when in the video the deviation becomes irreversible (the Point-of-No-Return, PNR), and where the mistake appears in the PNR frame. We develop MisEngine, a data engine that automatically constructs mistake samples from existing datasets with attribution-rich annotations. Applied to large egocentric corpora, MisEngine yields EPIC-KITCHENS-M and Ego4D-M -- two datasets up to two orders of magnitude larger than prior mistake datasets. We then present MisFormer, a unified attention-based model for mistake attribution across semantic, temporal, and spatial dimensions, trained with MisEngine supervision. A human study demonstrates the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning
