Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction
Jeremy Reizenstein, Roman Shapovalov, Philipp Henzler, Luca Sbordone,, Patrick Labatut, David Novotny

TL;DR
This paper introduces CO3D, a large-scale real-world 3D object dataset with multi-view images and annotations, enabling new evaluations and a novel Transformer-based neural rendering method for 3D reconstruction from few views.
Contribution
It provides the first large-scale real-world dataset for 3D object categories and introduces NerFormer, a Transformer-based neural rendering approach for 3D reconstruction from limited views.
Findings
CO3D dataset contains 1.5 million frames across 50 categories.
Large-scale in-the-wild evaluation of 3D reconstruction methods conducted.
NerFormer outperforms existing methods in few-view 3D reconstruction.
Abstract
Traditional approaches for learning 3D object categories have been predominantly trained and evaluated on synthetic datasets due to the unavailability of real 3D-annotated category-centric data. Our main goal is to facilitate advances in this field by collecting real-world data in a magnitude similar to the existing synthetic counterparts. The principal contribution of this work is thus a large-scale dataset, called Common Objects in 3D, with real multi-view images of object categories annotated with camera poses and ground truth 3D point clouds. The dataset contains a total of 1.5 million frames from nearly 19,000 videos capturing objects from 50 MS-COCO categories and, as such, it is significantly larger than alternatives both in terms of the number of categories and objects. We exploit this new dataset to conduct one of the first large-scale "in-the-wild" evaluations of several…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Adam · Dropout · Softmax · Residual Connection · Layer Normalization
