EMAG: Ego-motion Aware and Generalizable 2D Hand Forecasting from Egocentric Videos
Masashi Hatano, Ryo Hachiuma, Hideo Saito

TL;DR
EMAG is a novel approach for 2D hand forecasting from egocentric videos that effectively accounts for ego-motion and enhances generalization across diverse scenes and behaviors.
Contribution
The paper introduces EMAG, a method that incorporates ego-motion and multimodal data to improve 2D hand prediction accuracy and generalization in egocentric videos.
Findings
Outperforms prior methods by 1.7% intra-dataset
Achieves 7.0% improvement cross-dataset
Effectively handles ego-motion and scene variability
Abstract
Predicting future human behavior from egocentric videos is a challenging but critical task for human intention understanding. Existing methods for forecasting 2D hand positions rely on visual representations and mainly focus on hand-object interactions. In this paper, we investigate the hand forecasting task and tackle two significant issues that persist in the existing methods: (1) 2D hand positions in future frames are severely affected by ego-motions in egocentric videos; (2) prediction based on visual information tends to overfit to background or scene textures, posing a challenge for generalization on novel scenes or human behaviors. To solve the aforementioned problems, we propose EMAG, an ego-motion-aware and generalizable 2D hand forecasting method. In response to the first problem, we propose a method that considers ego-motion, represented by a sequence of homography matrices…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Hand Gesture Recognition Systems · Human Motion and Animation
MethodsFocus
