MBW: Multi-view Bootstrapping in the Wild
Mosam Dabhi, Chaoyang Wang, Tim Clifford, Laszlo Attila Jeni, Ian R., Fasel, Simon Lucey

TL;DR
This paper introduces MBW, a method that uses uncalibrated, handheld cameras and a neural prior to accurately estimate 2D and 3D landmarks of articulated objects with minimal supervision, enabling scalable analysis in natural settings.
Contribution
MBW combines a non-rigid 3D neural prior with deep flow to achieve high-fidelity landmark estimation from uncalibrated videos with only a few annotations, advancing beyond rigid, calibrated camera methods.
Findings
Achieves 2D landmark accuracy comparable to fully supervised methods with only 1-2% labeled frames.
Provides 3D reconstructions of articulated objects in natural, unstructured environments.
Demonstrates versatility across diverse species and objects in casual zoo videos.
Abstract
Labeling articulated objects in unconstrained settings have a wide variety of applications including entertainment, neuroscience, psychology, ethology, and many fields of medicine. Large offline labeled datasets do not exist for all but the most common articulated object categories (e.g., humans). Hand labeling these landmarks within a video sequence is a laborious task. Learned landmark detectors can help, but can be error-prone when trained from only a few examples. Multi-camera systems that train fine-grained detectors have shown significant promise in detecting such errors, allowing for self-supervised solutions that only need a small percentage of the video sequence to be hand-labeled. The approach, however, is based on calibrated cameras and rigid geometry, making it expensive, difficult to manage, and impractical in real-world scenarios. In this paper, we address these…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Human Pose and Action Recognition · Advanced Vision and Imaging
