CaptainCook4D: A Dataset for Understanding Errors in Procedural Activities
Rohith Peddi, Shivvrat Arya, Bharath Challa, Likhitha Pallapothula,, Akshay Vyas, Bhavya Gouripeddi, Jikai Wang, Qifan Zhang, Vasundhara, Komaragiri, Eric Ragan, Nicholas Ruozzi, Yu Xiang, Vibhav Gogate

TL;DR
CaptainCook4D is a comprehensive egocentric 4D dataset capturing real-world kitchen activities, designed to facilitate research on error detection, activity localization, and procedural understanding in complex, goal-oriented tasks.
Contribution
The paper introduces CaptainCook4D, a novel large-scale dataset with detailed annotations for understanding errors and procedures in egocentric kitchen activities, enabling new research directions.
Findings
Dataset includes 384 recordings totaling 94.5 hours.
Contains 5.3K step annotations and 10K action annotations.
Benchmarks established for error recognition, localization, and procedure learning.
Abstract
Following step-by-step procedures is an essential component of various activities carried out by individuals in their daily lives. These procedures serve as a guiding framework that helps to achieve goals efficiently, whether it is assembling furniture or preparing a recipe. However, the complexity and duration of procedural activities inherently increase the likelihood of making errors. Understanding such procedural activities from a sequence of frames is a challenging task that demands an accurate interpretation of visual information and the ability to reason about the structure of the activity. To this end, we collect a new egocentric 4D dataset, CaptainCook4D, comprising 384 recordings (94.5 hours) of people performing recipes in real kitchen environments. This dataset consists of two distinct types of activity: one in which participants adhere to the provided recipe instructions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsHuman Pose and Action Recognition · Context-Aware Activity Recognition Systems · Building Energy and Comfort Optimization
