Multiview Supervision By Registration
Yilun Zhang, Hyun Soo Park

TL;DR
This paper introduces a semi-supervised multiview learning framework that combines geometric, temporal, and visibility cues to train keypoint detectors with minimal labeled data, outperforming existing methods.
Contribution
It proposes a novel end-to-end neural network leveraging multiview geometry, optical flow, and view visibility for semi-supervised keypoint detection with limited labels.
Findings
Outperforms existing keypoint detectors including DeepLabCut.
Effectively utilizes less than 4% labeled data.
Demonstrates robustness across species like monkeys, dogs, and mice.
Abstract
This paper presents a semi-supervised learning framework to train a keypoint detector using multiview image streams given the limited labeled data (typically 4\%). We leverage the complementary relationship between multiview geometry and visual tracking to provide three types of supervisionary signals to utilize the unlabeled data: (1) keypoint detection in one view can be supervised by other views via the epipolar geometry; (2) a keypoint moves smoothly over time where its optical flow can be used to temporally supervise consecutive image frames to each other; (3) visible keypoint in one view is likely to be visible in the adjacent view. We integrate these three signals in a differentiable fashion to design a new end-to-end neural network composed of three pathways. This design allows us to extensively use the unlabeled data to train the keypoint detector. We show that our approach…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Advanced Vision and Imaging · Human Pose and Action Recognition
