On the generalization of learning-based 3D reconstruction

Miguel Angel Bautista; Walter Talbott; Shuangfei Zhai; Nitish; Srivastava; Joshua M Susskind

arXiv:2006.15427·cs.CV·June 30, 2020

On the generalization of learning-based 3D reconstruction

Miguel Angel Bautista, Walter Talbott, Shuangfei Zhai, Nitish, Srivastava, Joshua M Susskind

PDF

TL;DR

This paper investigates how model architecture biases affect the ability of learning-based monocular 3D reconstruction methods to generalize to unseen object categories, proposing mechanisms to improve this generalization.

Contribution

The paper identifies key inductive biases influencing generalization and introduces methods to enforce these biases, leading to improved performance on standard benchmarks.

Findings

01

Achieves state-of-the-art results on ShapeNet benchmark

02

Identifies three key inductive biases affecting generalization

03

Proposes mechanisms to enforce biases and improve unseen category reconstruction

Abstract

State-of-the-art learning-based monocular 3D reconstruction methods learn priors over object categories on the training set, and as a result struggle to achieve reasonable generalization to object categories unseen during training. In this paper we study the inductive biases encoded in the model architecture that impact the generalization of learning-based 3D reconstruction methods. We find that 3 inductive biases impact performance: the spatial extent of the encoder, the use of the underlying geometry of the scene to describe point features, and the mechanism to aggregate information from multiple views. Additionally, we propose mechanisms to enforce those inductive biases: a point representation that is aware of camera position, and a variance cost to aggregate information across views. Our model achieves state-of-the-art results on the standard ShapeNet 3D reconstruction benchmark in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.