BetterScene: 3D Scene Synthesis with Representation-Aligned Generative Model
Yuci Han, Charles Toth, John E. Anderson, William J. Shuart, Alper Yilmaz

TL;DR
BetterScene enhances novel view synthesis from sparse photos by aligning diffusion model representations and integrating 3D Gaussian Splatting, resulting in more consistent, artifact-free 3D scene reconstructions.
Contribution
It introduces a novel approach combining diffusion model latent space regularizations with 3D Gaussian Splatting for improved 3D scene synthesis.
Findings
Outperforms state-of-the-art methods on DL3DV-10K dataset.
Produces continuous, artifact-free novel views.
Enhances view consistency in 3D scene synthesis.
Abstract
We present BetterScene, an approach to enhance novel view synthesis (NVS) quality for diverse real-world scenes using extremely sparse, unconstrained photos. BetterScene leverages the production-ready Stable Video Diffusion (SVD) model pretrained on billions of frames as a strong backbone, aiming to mitigate artifacts and recover view-consistent details at inference time. Conventional methods have developed similar diffusion-based solutions to address these challenges of novel view synthesis. Despite significant improvements, these methods typically rely on off-the-shelf pretrained diffusion priors and fine-tune only the UNet module while keeping other components frozen, which still leads to inconsistent details and artifacts even when incorporating geometry-aware regularizations like depth or semantic conditions. To address this, we investigate the latent space of the diffusion model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging · 3D Shape Modeling and Analysis
