Accidental Turntables: Learning 3D Pose by Watching Objects Turn

Zezhou Cheng; Matheus Gadelha; Subhransu Maji

arXiv:2212.06300·cs.CV·December 14, 2022

Accidental Turntables: Learning 3D Pose by Watching Objects Turn

Zezhou Cheng, Matheus Gadelha, Subhransu Maji

PDF

Open Access

TL;DR

This paper introduces a novel approach for 3D object pose estimation from single images by leveraging in-the-wild videos of objects turning, using structure-from-motion and a multi-stage training scheme, without pose labels.

Contribution

It presents a new training method for 3D pose estimation using videos of objects turning and introduces a large, challenging dataset for benchmarking.

Findings

01

Achieves competitive accuracy on standard benchmarks.

02

Does not require pose labels during training.

03

Provides a new dataset with over 41,000 images.

Abstract

We propose a technique for learning single-view 3D object pose estimation models by utilizing a new source of data -- in-the-wild videos where objects turn. Such videos are prevalent in practice (e.g., cars in roundabouts, airplanes near runways) and easy to collect. We show that classical structure-from-motion algorithms, coupled with the recent advances in instance detection and feature matching, provides surprisingly accurate relative 3D pose estimation on such videos. We propose a multi-stage training scheme that first learns a canonical pose across a collection of videos and then supervises a model for single-view pose estimation. The proposed technique achieves competitive performance with respect to existing state-of-the-art on standard benchmarks for 3D pose estimation, without requiring any pose labels during training. We also contribute an Accidental Turntables Dataset,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Advanced Neural Network Applications