DT-NVS: Diffusion Transformers for Novel View Synthesis

Wonbong Jang; Jonathan Tremblay; Lourdes Agapito

arXiv:2511.08823·cs.CV·November 13, 2025

DT-NVS: Diffusion Transformers for Novel View Synthesis

Wonbong Jang, Jonathan Tremblay, Lourdes Agapito

PDF

Open Access

TL;DR

This paper introduces DT-NVS, a 3D diffusion model with transformer architecture for generalized novel view synthesis from a single image, trained on real-world videos, outperforming existing methods in diversity and quality.

Contribution

The paper presents a novel 3D diffusion model with transformer backbone, new camera conditioning strategies, and a unique training paradigm for real-world, unaligned datasets.

Findings

01

Outperforms state-of-the-art 3D diffusion models.

02

Generates diverse and high-quality novel views.

03

Effective on real-world, unaligned video datasets.

Abstract

Generating novel views of a natural scene, e.g., every-day scenes both indoors and outdoors, from a single view is an under-explored problem, even though it is an organic extension to the object-centric novel view synthesis. Existing diffusion-based approaches focus rather on small camera movements in real scenes or only consider unnatural object-centric scenes, limiting their potential applications in real-world settings. In this paper we move away from these constrained regimes and propose a 3D diffusion model trained with image-only losses on a large-scale dataset of real-world, multi-category, unaligned, and casually acquired videos of everyday scenes. We propose DT-NVS, a 3D-aware diffusion model for generalized novel view synthesis that exploits a transformer-based architecture backbone. We make significant contributions to transformer and self-attention architectures to translate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging · Image Enhancement Techniques