TL;DR
The MECCANO dataset provides a new egocentric video benchmark for understanding human-object interactions in industrial-like settings, enabling research on action recognition and object detection from a first-person perspective.
Contribution
We introduce MECCANO, the first egocentric dataset focused on industrial scenarios, with detailed annotations for multiple human-object interaction tasks.
Findings
Baseline results highlight the dataset's challenge for egocentric interaction tasks.
The dataset supports four key tasks: action recognition, active object detection, recognition, and interaction detection.
Public release of the dataset facilitates further research in egocentric industrial applications.
Abstract
Wearable cameras allow to collect images and videos of humans interacting with the world. While human-object interactions have been thoroughly investigated in third person vision, the problem has been understudied in egocentric settings and in industrial scenarios. To fill this gap, we introduce MECCANO, the first dataset of egocentric videos to study human-object interactions in industrial-like settings. MECCANO has been acquired by 20 participants who were asked to build a motorbike model, for which they had to interact with tiny objects and tools. The dataset has been explicitly labeled for the task of recognizing human-object interactions from an egocentric perspective. Specifically, each interaction has been labeled both temporally (with action segments) and spatially (with active object bounding boxes). With the proposed dataset, we investigate four different tasks including 1)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
