TL;DR
This paper introduces an unsupervised method called Supervision by Registration and Triangulation (SRT) that leverages unlabeled multi-view video data to enhance landmark detection accuracy and precision without manual annotations.
Contribution
The paper proposes a novel unsupervised training approach using registration and multi-view consistency, enabling learning from unlabeled data for landmark detection.
Findings
Improved landmark detection accuracy across 11 datasets.
Demonstrated precision gains with a new metric.
Effective use of unlabeled multi-view video data.
Abstract
We present Supervision by Registration and Triangulation (SRT), an unsupervised approach that utilizes unlabeled multi-view video to improve the accuracy and precision of landmark detectors. Being able to utilize unlabeled data enables our detectors to learn from massive amounts of unlabeled data freely available and not be limited by the quality and quantity of manual human annotations. To utilize unlabeled data, there are two key observations: (1) the detections of the same landmark in adjacent frames should be coherent with registration, i.e., optical flow. (2) the detections of the same landmark in multiple synchronized and geometrically calibrated views should correspond to a single 3D point, i.e., multi-view consistency. Registration and multi-view consistency are sources of supervision that do not require manual labeling, thus it can be leveraged to augment existing training data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
