GeoNVS: Geometry Grounded Video Diffusion for Novel View Synthesis
Minjun Kang, Inkyu Shin, Taeyeop Lee, Myungchul Kim, In So Kweon, and Kuk-Jin Yoon

TL;DR
GeoNVS introduces a geometry-grounded video diffusion method that significantly improves novel view synthesis by explicitly incorporating 3D geometric guidance, leading to better geometric fidelity and camera control.
Contribution
The paper proposes GS-Adapter, a novel feature adapter that lifts 2D features into 3D Gaussian representations for improved view synthesis without additional training.
Findings
Achieves state-of-the-art performance across multiple scenes and settings.
Improves geometric accuracy with up to 2x reduction in translation error.
Outperforms prior methods by 11.3% and 14.9% in key metrics.
Abstract
Novel view synthesis requires strong 3D geometric consistency and the ability to generate visually coherent images across diverse viewpoints. While recent camera-controlled video diffusion models show promising results, they often suffer from geometric distortions and limited camera controllability. To overcome these challenges, we introduce GeoNVS, a geometry-grounded novel-view synthesizer that enhances both geometric fidelity and camera controllability through explicit 3D geometric guidance. Our key innovation is the Gaussian Splat Feature Adapter (GS-Adapter), which lifts input-view diffusion features into 3D Gaussian representations, renders geometry-constrained novel-view features, and adaptively fuses them with diffusion features to correct geometrically inconsistent representations. Unlike prior methods that inject geometry at the input level, GS-Adapter operates in feature…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging · Video Coding and Compression Technologies
