Joint Person Segmentation and Identification in Synchronized First- and Third-person Videos
Mingze Xu, Chenyou Fan, Yuchen Wang, Michael S Ryoo, David J Crandall

TL;DR
This paper introduces a joint approach for segmenting and identifying people across synchronized multi-view videos, including first- and third-person perspectives, improving accuracy without relying on ground truth bounding boxes.
Contribution
It proposes a novel method that simultaneously performs person segmentation and identification across multiple synchronized videos, leveraging mutual benefits between these tasks.
Findings
Significantly outperforms state-of-the-art methods on challenging datasets.
Joint segmentation and identification improve accuracy in multi-view scenarios.
Method effectively handles both third- and first-person videos.
Abstract
In a world of pervasive cameras, public spaces are often captured from multiple perspectives by cameras of different types, both fixed and mobile. An important problem is to organize these heterogeneous collections of videos by finding connections between them, such as identifying correspondences between the people appearing in the videos and the people holding or wearing the cameras. In this paper, we wish to solve two specific problems: (1) given two or more synchronized third-person videos of a scene, produce a pixel-level segmentation of each visible person and identify corresponding people across different views (i.e., determine who in camera A corresponds with whom in camera B), and (2) given one or more synchronized third-person videos as well as a first-person video taken by a mobile or wearable camera, segment and identify the camera wearer in the third-person videos. Unlike…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
