Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life   3D Category Reconstruction

Jeremy Reizenstein; Roman Shapovalov; Philipp Henzler; Luca Sbordone,; Patrick Labatut; David Novotny

arXiv:2109.00512·cs.CV·September 2, 2021·1 cites

Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction

Jeremy Reizenstein, Roman Shapovalov, Philipp Henzler, Luca Sbordone,, Patrick Labatut, David Novotny

PDF

Open Access 1 Repo

TL;DR

This paper introduces CO3D, a large-scale real-world 3D object dataset with multi-view images and annotations, enabling new evaluations and a novel Transformer-based neural rendering method for 3D reconstruction from few views.

Contribution

It provides the first large-scale real-world dataset for 3D object categories and introduces NerFormer, a Transformer-based neural rendering approach for 3D reconstruction from limited views.

Findings

01

CO3D dataset contains 1.5 million frames across 50 categories.

02

Large-scale in-the-wild evaluation of 3D reconstruction methods conducted.

03

NerFormer outperforms existing methods in few-view 3D reconstruction.

Abstract

Traditional approaches for learning 3D object categories have been predominantly trained and evaluated on synthetic datasets due to the unavailability of real 3D-annotated category-centric data. Our main goal is to facilitate advances in this field by collecting real-world data in a magnitude similar to the existing synthetic counterparts. The principal contribution of this work is thus a large-scale dataset, called Common Objects in 3D, with real multi-view images of object categories annotated with camera poses and ground truth 3D point clouds. The dataset contains a total of 1.5 million frames from nearly 19,000 videos capturing objects from 50 MS-COCO categories and, as such, it is significantly larger than alternatives both in terms of the number of categories and objects. We exploit this new dataset to conduct one of the first large-scale "in-the-wild" evaluations of several…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

facebookresearch/co3d
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Adam · Dropout · Softmax · Residual Connection · Layer Normalization