Consistent-1-to-3: Consistent Image to 3D View Synthesis via   Geometry-aware Diffusion Models

Jianglong Ye; Peng Wang; Kejie Li; Yichun Shi; Heng Wang

arXiv:2310.03020·cs.CV·March 18, 2024·1 cites

Consistent-1-to-3: Consistent Image to 3D View Synthesis via Geometry-aware Diffusion Models

Jianglong Ye, Peng Wang, Kejie Li, Yichun Shi, Heng Wang

PDF

Open Access

TL;DR

Consistent-1-to-3 introduces a geometry-aware diffusion framework for zero-shot 3D view synthesis from a single image, ensuring high-quality, multi-view consistent 3D object representations.

Contribution

The paper proposes a novel two-stage generative framework with geometry-guided attention mechanisms for improved 3D consistency in single-image view synthesis.

Findings

01

Outperforms state-of-the-art methods in qualitative and quantitative metrics

02

Enables full 360-degree object visualization from a single image

03

Effectively incorporates geometric constraints through epipolar-guided attention

Abstract

Zero-shot novel view synthesis (NVS) from a single image is an essential problem in 3D object understanding. While recent approaches that leverage pre-trained generative models can synthesize high-quality novel views from in-the-wild inputs, they still struggle to maintain 3D consistency across different views. In this paper, we present Consistent-1-to-3, which is a generative framework that significantly mitigates this issue. Specifically, we decompose the NVS task into two stages: (i) transforming observed regions to a novel view, and (ii) hallucinating unseen regions. We design a scene representation transformer and view-conditioned diffusion model for performing these two stages respectively. Inside the models, to enforce 3D consistency, we propose to employ epipolor-guided attention to incorporate geometry constraints, and multi-view attention to better aggregate multi-view…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Generative Adversarial Networks and Image Synthesis · Advanced Image and Video Retrieval Techniques

MethodsDiffusion