EgoOops: A Dataset for Mistake Action Detection from Egocentric Videos referring to Procedural Texts

Yuto Haneji; Taichi Nishimura; Hirotaka Kameko; Keisuke Shirai; Tomoya Yoshida; Keiya Kajimura; Koki Yamamoto; Taiyu Cui; Tomohiro Nishimoto; Shinsuke Mori

arXiv:2410.05343·cs.CV·August 1, 2025

EgoOops: A Dataset for Mistake Action Detection from Egocentric Videos referring to Procedural Texts

Yuto Haneji, Taichi Nishimura, Hirotaka Kameko, Keisuke Shirai, Tomoya Yoshida, Keiya Kajimura, Koki Yamamoto, Taiyu Cui, Tomohiro Nishimoto, Shinsuke Mori

PDF

Open Access

TL;DR

This paper introduces EgoOops, a new egocentric video dataset with procedural texts for mistake detection, and proposes a method leveraging text-video alignment to improve error identification.

Contribution

The paper presents the EgoOops dataset with diverse procedural texts and annotations, and a novel approach combining video-text alignment for mistake detection.

Findings

01

Procedural texts are crucial for mistake detection.

02

The proposed method improves mistake detection accuracy.

03

EgoOops dataset enables research across various domains.

Abstract

Mistake action detection is crucial for developing intelligent archives that detect workers' errors and provide feedback. Existing studies have focused on visually apparent mistakes in free-style activities, resulting in video-only approaches to mistake detection. However, in text-following activities, models cannot determine the correctness of some actions without referring to the texts. Additionally, current mistake datasets rarely use procedural texts for video recording except for cooking. To fill these gaps, this paper proposes the EgoOops dataset, where egocentric videos record erroneous activities when following procedural texts across diverse domains. It features three types of annotations: video-text alignment, mistake labels, and descriptions for mistakes. We also propose a mistake detection approach, combining video-text alignment and mistake label classification to leverage…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Video Analysis and Summarization