NOVA3D: Normal Aligned Video Diffusion Model for Single Image to 3D Generation

Yuxiao Yang; Peihao Li; Yuhong Zhang; Junzhe Lu; Xianglong He; Minghan Qin; Weitao Wang; Haoqian Wang

arXiv:2506.07698·cs.CV·June 10, 2025

NOVA3D: Normal Aligned Video Diffusion Model for Single Image to 3D Generation

Yuxiao Yang, Peihao Li, Yuhong Zhang, Junzhe Lu, Xianglong He, Minghan Qin, Weitao Wang, Haoqian Wang

PDF

Open Access

TL;DR

NOVA3D introduces a novel framework that leverages pretrained video diffusion models and geometric alignment techniques to generate high-quality 3D content from a single image with improved multi-view consistency.

Contribution

The paper proposes NOVA3D, integrating 3D priors from video diffusion models and novel attention and fusion algorithms for enhanced 3D generation from a single image.

Findings

01

Outperforms existing methods in multi-view consistency

02

Achieves higher texture fidelity and pose accuracy

03

Demonstrates superior generalization in 3D reconstruction

Abstract

3D AI-generated content (AIGC) has made it increasingly accessible for anyone to become a 3D content creator. While recent methods leverage Score Distillation Sampling to distill 3D objects from pretrained image diffusion models, they often suffer from inadequate 3D priors, leading to insufficient multi-view consistency. In this work, we introduce NOVA3D, an innovative single-image-to-3D generation framework. Our key insight lies in leveraging strong 3D priors from a pretrained video diffusion model and integrating geometric information during multi-view video fine-tuning. To facilitate information exchange between color and geometric domains, we propose the Geometry-Temporal Alignment (GTA) attention mechanism, thereby improving generalization and multi-view consistency. Moreover, we introduce the de-conflict geometry fusion algorithm, which improves texture fidelity by addressing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis · Face recognition and analysis