Identifying First-person Camera Wearers in Third-person Videos

Chenyou Fan; Jangwon Lee; Mingze Xu; Krishna Kumar Singh; Yong Jae; Lee; David J. Crandall; Michael S. Ryoo

arXiv:1704.06340·cs.CV·April 24, 2017·5 cites

Identifying First-person Camera Wearers in Third-person Videos

Chenyou Fan, Jangwon Lee, Mingze Xu, Krishna Kumar Singh, Yong Jae, Lee, David J. Crandall, Michael S. Ryoo

PDF

Open Access

TL;DR

This paper introduces a semi-Siamese CNN architecture that learns a joint embedding space to match first- and third-person videos, enabling person identification across different camera perspectives in complex scenes.

Contribution

The paper proposes a novel semi-Siamese CNN with a triplet loss for cross-view person matching, addressing a previously unexplored challenge.

Findings

01

Significantly outperforms baseline methods in matching accuracy.

02

Effectively learns features optimized for cross-view person identification.

03

Demonstrates robustness in complex multi-person scenarios.

Abstract

We consider scenarios in which we wish to perform joint scene understanding, object tracking, activity recognition, and other tasks in environments in which multiple people are wearing body-worn cameras while a third-person static camera also captures the scene. To do this, we need to establish person-level correspondences across first- and third-person videos, which is challenging because the camera wearer is not visible from his/her own egocentric video, preventing the use of direct feature matching. In this paper, we propose a new semi-Siamese Convolutional Neural Network architecture to address this novel challenge. We formulate the problem as learning a joint embedding space for first- and third-person videos that considers both spatial- and motion-domain cues. A new triplet loss function is designed to minimize the distance between correct first- and third-person matches while…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Advanced Vision and Imaging