Learning 3D Scene Priors with 2D Supervision

Yinyu Nie; Angela Dai; Xiaoguang Han; Matthias Nie{\ss}ner

arXiv:2211.14157·cs.CV·November 28, 2022

Learning 3D Scene Priors with 2D Supervision

Yinyu Nie, Angela Dai, Xiaoguang Han, Matthias Nie{\ss}ner

PDF

Open Access

TL;DR

This paper introduces a novel approach to learning 3D scene priors using only 2D multi-view images, enabling effective scene understanding and reconstruction without expensive 3D supervision.

Contribution

The method learns 3D scene priors from 2D supervision, eliminating the need for 3D ground truth data, and employs an autoregressive decoder for scene representation.

Findings

01

Outperforms state-of-the-art in single-view reconstruction

02

Achieves top results in scene synthesis without 3D supervision

03

Effective in scene interpolation and reconstruction tasks

Abstract

Holistic 3D scene understanding entails estimation of both layout configuration and object geometry in a 3D environment. Recent works have shown advances in 3D scene estimation from various input modalities (e.g., images, 3D scans), by leveraging 3D supervision (e.g., 3D bounding boxes or CAD models), for which collection at scale is expensive and often intractable. To address this shortcoming, we propose a new method to learn 3D scene priors of layout and shape without requiring any 3D ground truth. Instead, we rely on 2D supervision from multi-view RGB images. Our method represents a 3D scene as a latent vector, from which we can progressively decode to a sequence of objects characterized by their class categories, 3D bounding boxes, and meshes. With our trained autoregressive decoder representing the scene prior, our method facilitates many downstream applications, including scene…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Robotics and Sensor-Based Localization · 3D Surveying and Cultural Heritage