Self-supervised Multi-view Person Association and Its Applications
Minh Vo, Ersin Yumer, Kalyan Sunkavalli, Sunil Hadap, Yaser Sheikh,, and Srinivasa Narasimhan

TL;DR
This paper introduces a self-supervised framework for multi-view person association that improves tracking accuracy and 3D skeleton reconstruction in complex, multi-camera social scenes, enabling better multi-angle video synthesis.
Contribution
It presents a novel self-supervised method to adapt person descriptors for multi-view tracking without labeled data, enhancing association accuracy and 3D tracking in challenging scenarios.
Findings
Up to 18% improvement in association accuracy
5 to 10 times more stable 3D skeleton tracking
Effective multi-angle video synthesis from multi-view data
Abstract
Reliable markerless motion tracking of people participating in a complex group activity from multiple moving cameras is challenging due to frequent occlusions, strong viewpoint and appearance variations, and asynchronous video streams. To solve this problem, reliable association of the same person across distant viewpoints and temporal instances is essential. We present a self-supervised framework to adapt a generic person appearance descriptor to the unlabeled videos by exploiting motion tracking, mutual exclusion constraints, and multi-view geometry. The adapted discriminative descriptor is used in a tracking-by-clustering formulation. We validate the effectiveness of our descriptor learning on WILDTRACK [14] and three new complex social scenes captured by multiple cameras with up to 60 people "in the wild". We report significant improvement in association accuracy (up to 18%) and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
