NVComposer: Boosting Generative Novel View Synthesis with Multiple   Sparse and Unposed Images

Lingen Li; Zhaoyang Zhang; Yaowei Li; Jiale Xu; Wenbo Hu; Xiaoyu Li,; Weihao Cheng; Jinwei Gu; Tianfan Xue; Ying Shan

arXiv:2412.03517·cs.CV·December 9, 2024

NVComposer: Boosting Generative Novel View Synthesis with Multiple Sparse and Unposed Images

Lingen Li, Zhaoyang Zhang, Yaowei Li, Jiale Xu, Wenbo Hu, Xiaoyu Li,, Weihao Cheng, Jinwei Gu, Tianfan Xue, Ying Shan

PDF

Open Access 1 Models

TL;DR

NVComposer introduces a novel generative approach for novel view synthesis that eliminates the need for explicit external pose alignment by implicitly inferring spatial relationships through a dual-stream diffusion model and geometry-aware feature alignment.

Contribution

It proposes NVComposer, a new method that removes external alignment requirements in multi-view NVS by jointly generating views and poses, and distilling geometric priors during training.

Findings

01

Achieves state-of-the-art performance in multi-view NVS tasks.

02

Improves synthesis quality with increasing unposed input views.

03

Removes reliance on external pose estimation, enhancing accessibility.

Abstract

Recent advancements in generative models have significantly improved novel view synthesis (NVS) from multi-view data. However, existing methods depend on external multi-view alignment processes, such as explicit pose estimation or pre-reconstruction, which limits their flexibility and accessibility, especially when alignment is unstable due to insufficient overlap or occlusions between views. In this paper, we propose NVComposer, a novel approach that eliminates the need for explicit external alignment. NVComposer enables the generative model to implicitly infer spatial and geometric relationships between multiple conditional views by introducing two key components: 1) an image-pose dual-stream diffusion model that simultaneously generates target novel views and condition camera poses, and 2) a geometry-aware feature alignment module that distills geometric priors from dense stereo…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
TencentARC/NVComposer
model· 94 dl· ♡ 7
94 dl♡ 7

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Advanced Image and Video Retrieval Techniques · Computer Graphics and Visualization Techniques

MethodsDiffusion