Ego-EXTRA: video-language Egocentric Dataset for EXpert-TRAinee assistance
Francesco Ragusa, Michele Mazzamuto, Rosario Forte, Irene D'Ambra, James Fort, Jakob Engel, Antonino Furnari, Giovanni Maria Farinella

TL;DR
Ego-EXTRA is a new egocentric video-language dataset capturing expert-trainee interactions during procedural tasks, designed to evaluate multimodal models in providing expert assistance from a first-person perspective.
Contribution
The paper introduces Ego-EXTRA, a novel egocentric dataset with high-quality dialogue and visual data for benchmarking video-language models in expert assistance scenarios.
Findings
Current models struggle with the complexity of egocentric dialogue tasks.
Ego-EXTRA provides a challenging benchmark highlighting limitations of existing multimodal models.
The dataset enables future research in egocentric video-language understanding and assistance.
Abstract
We present Ego-EXTRA, a video-language Egocentric Dataset for EXpert-TRAinee assistance. Ego-EXTRA features 50 hours of unscripted egocentric videos of subjects performing procedural activities (the trainees) while guided by real-world experts who provide guidance and answer specific questions using natural language. Following a ``Wizard of OZ'' data collection paradigm, the expert enacts a wearable intelligent assistant, looking at the activities performed by the trainee exclusively from their egocentric point of view, answering questions when asked by the trainee, or proactively interacting with suggestions during the procedures. This unique data collection protocol enables Ego-EXTRA to capture a high-quality dialogue in which expert-level feedback is provided to the trainee. Two-way dialogues between experts and trainees are recorded, transcribed, and used to create a novel benchmark…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Social Robot Interaction and HRI
