Accidental Turntables: Learning 3D Pose by Watching Objects Turn
Zezhou Cheng, Matheus Gadelha, Subhransu Maji

TL;DR
This paper introduces a novel approach for 3D object pose estimation from single images by leveraging in-the-wild videos of objects turning, using structure-from-motion and a multi-stage training scheme, without pose labels.
Contribution
It presents a new training method for 3D pose estimation using videos of objects turning and introduces a large, challenging dataset for benchmarking.
Findings
Achieves competitive accuracy on standard benchmarks.
Does not require pose labels during training.
Provides a new dataset with over 41,000 images.
Abstract
We propose a technique for learning single-view 3D object pose estimation models by utilizing a new source of data -- in-the-wild videos where objects turn. Such videos are prevalent in practice (e.g., cars in roundabouts, airplanes near runways) and easy to collect. We show that classical structure-from-motion algorithms, coupled with the recent advances in instance detection and feature matching, provides surprisingly accurate relative 3D pose estimation on such videos. We propose a multi-stage training scheme that first learns a canonical pose across a collection of videos and then supervises a model for single-view pose estimation. The proposed technique achieves competitive performance with respect to existing state-of-the-art on standard benchmarks for 3D pose estimation, without requiring any pose labels during training. We also contribute an Accidental Turntables Dataset,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Advanced Neural Network Applications
