Unsupervised Learning of 3D Object Categories from Videos in the Wild
Philipp Henzler, Jeremy Reizenstein, Patrick Labatut, Roman, Shapovalov, Tobias Ritschel, Andrea Vedaldi, David Novotny

TL;DR
This paper introduces a new neural network model and dataset for unsupervised 3D object reconstruction from videos in the wild, overcoming challenges of real-world data without manual annotations.
Contribution
It presents a novel dataset of object-centric videos and a new neural network architecture, WCR, for improved unsupervised 3D reconstruction from multiple views.
Findings
WCR significantly improves reconstruction quality.
The dataset enables benchmarking in real-world conditions.
The method outperforms existing monocular reconstruction baselines.
Abstract
Our goal is to learn a deep network that, given a small number of images of an object of a given category, reconstructs it in 3D. While several recent works have obtained analogous results using synthetic data or assuming the availability of 2D primitives such as keypoints, we are interested in working with challenging real data and with no manual annotations. We thus focus on learning a model from multiple views of a large collection of object instances. We contribute with a new large dataset of object centric videos suitable for training and benchmarking this class of models. We show that existing techniques leveraging meshes, voxels, or implicit surfaces, which work well for reconstructing isolated objects, fail on this challenging data. Finally, we propose a new neural network design, called warp-conditioned ray embedding (WCR), which significantly improves reconstruction while…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
