IndEgo: A Dataset of Industrial Scenarios and Collaborative Work for Egocentric Assistants
Vivek Chavan, Yasmina Imgrund, Tung Dao, Sanwantri Bai, Bosong Wang, Ze Lu, Oliver Heimann, J\"org Kr\"uger

TL;DR
IndEgo is a comprehensive multimodal dataset capturing industrial scenarios with egocentric and exocentric perspectives, designed to advance research in collaborative industrial tasks, mistake detection, and procedural understanding for assistive systems.
Contribution
The paper introduces IndEgo, a large-scale, multimodal dataset with detailed annotations for industrial tasks, including collaborative work, to facilitate research in egocentric assistive technologies.
Findings
Baseline models struggle with the dataset's complexity.
Rich multimodal data improves task understanding.
Challenges highlight the need for advanced multimodal models.
Abstract
We introduce IndEgo, a multimodal egocentric and exocentric dataset addressing common industrial tasks, including assembly/disassembly, logistics and organisation, inspection and repair, woodworking, and others. The dataset contains 3,460 egocentric recordings (approximately 197 hours), along with 1,092 exocentric recordings (approximately 97 hours). A key focus of the dataset is collaborative work, where two workers jointly perform cognitively and physically intensive tasks. The egocentric recordings include rich multimodal data and added context via eye gaze, narration, sound, motion, and others. We provide detailed annotations (actions, summaries, mistake annotations, narrations), metadata, processed outputs (eye gaze, hand pose, semi-dense point cloud), and benchmarks on procedural and non-procedural task understanding, Mistake Detection, and reasoning-based Question Answering.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Social Robot Interaction and HRI
