Forecasting Action through Contact Representations from First Person Video
Eadom Dessalene, Chinmaya Devaraj, Michael Maynord, Cornelia, Fermuller, and Yiannis Aloimonos

TL;DR
This paper introduces contact-based representations and models for egocentric video action anticipation, achieving state-of-the-art results on the EPIC Kitchens dataset by predicting future hand-object interactions.
Contribution
It proposes novel contact-centric representations and the Anticipation Module, integrated into Ego-OMG, to improve action prediction in first-person videos.
Findings
Achieved 1st and 2nd place on EPIC Kitchens Action Anticipation Challenge.
State-of-the-art results on action anticipation and prediction tasks.
Validated the utility of contact-based representations through ablation studies.
Abstract
Human actions involving hand manipulations are structured according to the making and breaking of hand-object contact, and human visual understanding of action is reliant on anticipation of contact as is demonstrated by pioneering work in cognitive science. Taking inspiration from this, we introduce representations and models centered on contact, which we then use in action prediction and anticipation. We annotate a subset of the EPIC Kitchens dataset to include time-to-contact between hands and objects, as well as segmentations of hands and objects. Using these annotations we train the Anticipation Module, a module producing Contact Anticipation Maps and Next Active Object Segmentations - novel low-level representations providing temporal and spatial characteristics of anticipated near future action. On top of the Anticipation Module we apply Egocentric Object Manipulation Graphs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
