Future Person Localization in First-Person Videos
Takuma Yagi, Karttikeya Mangalam, Ryo Yonetani, Yoichi Sato

TL;DR
This paper introduces a new task of predicting future locations of people in first-person videos, leveraging ego-motion, scale cues, and pose information to improve localization accuracy.
Contribution
It proposes a novel prediction framework that integrates multiple cues specific to first-person videos for future person localization.
Findings
Effective on a new dataset and a public social interaction dataset.
Outperforms baseline methods in future person localization.
Highlights importance of ego-motion, scale, and pose cues in first-person videos.
Abstract
We present a new task that predicts future locations of people observed in first-person videos. Consider a first-person video stream continuously recorded by a wearable camera. Given a short clip of a person that is extracted from the complete stream, we aim to predict that person's location in future frames. To facilitate this future person localization ability, we make the following three key observations: a) First-person videos typically involve significant ego-motion which greatly affects the location of the target person in future frames; b) Scales of the target person act as a salient cue to estimate a perspective effect in first-person videos; c) First-person videos often capture people up-close, making it easier to leverage target poses (e.g., where they look) for predicting their future locations. We incorporate these three observations into a prediction framework with a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Human Pose and Action Recognition · Anomaly Detection Techniques and Applications
