Egocentric Activity Recognition and Localization on a 3D Map
Miao Liu, Lingni Ma, Kiran Somasundaram, Yin Li, Kristen Grauman,, James M. Rehg, Chao Li

TL;DR
This paper introduces a deep probabilistic model that jointly recognizes and localizes egocentric actions within a 3D environment using a hierarchical volumetric representation and video data, advancing scene understanding.
Contribution
The novel model integrates 3D environment context with egocentric video to improve action recognition and localization in known and unknown environments.
Findings
Strong results on action recognition accuracy.
Effective 3D localization in diverse environments.
Generalizes well to unseen scenes.
Abstract
Given a video captured from a first person perspective and the environment context of where the video is recorded, can we recognize what the person is doing and identify where the action occurs in the 3D space? We address this challenging problem of jointly recognizing and localizing actions of a mobile user on a known 3D map from egocentric videos. To this end, we propose a novel deep probabilistic model. Our model takes the inputs of a Hierarchical Volumetric Representation (HVR) of the 3D environment and an egocentric video, infers the 3D action location as a latent variable, and recognizes the action based on the video and contextual cues surrounding its potential locations. To evaluate our model, we conduct extensive experiments on the subset of Ego4D dataset, in which both human naturalistic actions and photo-realistic 3D environment reconstructions are captured. Our method…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Multimodal Machine Learning Applications
