GeoNVS: Geometry Grounded Video Diffusion for Novel View Synthesis

Minjun Kang; Inkyu Shin; Taeyeop Lee; Myungchul Kim; In So Kweon; and Kuk-Jin Yoon

arXiv:2603.14965·cs.CV·March 17, 2026

GeoNVS: Geometry Grounded Video Diffusion for Novel View Synthesis

Minjun Kang, Inkyu Shin, Taeyeop Lee, Myungchul Kim, In So Kweon, and Kuk-Jin Yoon

PDF

Open Access

TL;DR

GeoNVS introduces a geometry-grounded video diffusion method that significantly improves novel view synthesis by explicitly incorporating 3D geometric guidance, leading to better geometric fidelity and camera control.

Contribution

The paper proposes GS-Adapter, a novel feature adapter that lifts 2D features into 3D Gaussian representations for improved view synthesis without additional training.

Findings

01

Achieves state-of-the-art performance across multiple scenes and settings.

02

Improves geometric accuracy with up to 2x reduction in translation error.

03

Outperforms prior methods by 11.3% and 14.9% in key metrics.

Abstract

Novel view synthesis requires strong 3D geometric consistency and the ability to generate visually coherent images across diverse viewpoints. While recent camera-controlled video diffusion models show promising results, they often suffer from geometric distortions and limited camera controllability. To overcome these challenges, we introduce GeoNVS, a geometry-grounded novel-view synthesizer that enhances both geometric fidelity and camera controllability through explicit 3D geometric guidance. Our key innovation is the Gaussian Splat Feature Adapter (GS-Adapter), which lifts input-view diffusion features into 3D Gaussian representations, renders geometry-constrained novel-view features, and adaptively fuses them with diffusion features to correct geometrically inconsistent representations. Unlike prior methods that inject geometry at the input level, GS-Adapter operates in feature…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging · Video Coding and Compression Technologies