Scaling Egocentric Vision: The EPIC-KITCHENS Dataset

Dima Damen; Hazel Doughty; Giovanni Maria Farinella; Sanja Fidler,; Antonino Furnari; Evangelos Kazakos; Davide Moltisanti; Jonathan Munro; Toby; Perrett; Will Price; Michael Wray

arXiv:1804.02748·cs.CV·August 1, 2018·293 cites

Scaling Egocentric Vision: The EPIC-KITCHENS Dataset

Dima Damen, Hazel Doughty, Giovanni Maria Farinella, Sanja Fidler,, Antonino Furnari, Evangelos Kazakos, Davide Moltisanti, Jonathan Munro, Toby, Perrett, Will Price, Michael Wray

PDF

Open Access 2 Repos 1 Datasets

TL;DR

The paper introduces EPIC-KITCHENS, a large-scale egocentric video dataset capturing diverse daily kitchen activities, enabling advancements in understanding first-person interactions, actions, and object usage.

Contribution

It provides a comprehensive, annotated egocentric video dataset with diverse participants and environments, facilitating research in action recognition, object detection, and anticipation in first-person vision.

Findings

01

Established baseline results for action recognition and object detection.

02

Demonstrated the dataset's diversity and real-world applicability.

03

Highlighted challenges in egocentric video understanding.

Abstract

First-person vision is gaining interest as it offers a unique viewpoint on people's interaction with objects, their attention, and even intention. However, progress in this challenging domain has been relatively slow due to the lack of sufficiently large datasets. In this paper, we introduce EPIC-KITCHENS, a large-scale egocentric video benchmark recorded by 32 participants in their native kitchen environments. Our videos depict nonscripted daily activities: we simply asked each participant to start recording every time they entered their kitchen. Recording took place in 4 cities (in North America and Europe) by participants belonging to 10 different nationalities, resulting in highly diverse cooking styles. Our dataset features 55 hours of video consisting of 11.5M frames, which we densely labeled for a total of 39.6K action segments and 454.3K object bounding boxes. Our annotation is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Datasets

Gaugou/epic_kitchens_100
dataset· 12 dl
12 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Visual Attention and Saliency Detection