IVGT: Implicit Visual Geometry Transformer for Neural Scene Representation
Yuqi Wu, Tianyu Hu, Wenzhao Zheng, Yuanhui Huang, Haowen Sun, Jie Zhou, Jiwen Lu

TL;DR
IVGT introduces an implicit neural scene representation that models continuous 3D geometry from pose-free multi-view images, enabling high-quality reconstruction and rendering from arbitrary viewpoints.
Contribution
The paper presents IVGT, a novel implicit transformer-based model that predicts continuous 3D geometry without explicit pointmaps, improving coherence and generalization across scenes.
Findings
IVGT achieves state-of-the-art results in mesh and point cloud reconstruction.
It enables high-quality novel view synthesis and surface normal estimation.
The model generalizes well across diverse scenes and tasks.
Abstract
Reconstructing coherent 3D geometry and appearance from unposed multi-view images is a fundamental yet challenging problem in computer vision. Most existing visual geometry foundation models predict explicit geometry by regressing pixel-aligned pointmaps, often suffering from redundancy and limited geometric continuity. We propose IVGT, an Implicit Visual Geometry Transformer that implicitly models continuous and coherent geometry from pose-free multi-view images. This formulation learns a continuous neural scene representation in a canonical coordinate system and supports continuous spatial queries at any 3D positions, retrieving local features to predict signed distance (SDF) values and colors using lightweight decoders. It allows direct extraction of continuous and coherent surface geometry, enabling rendering of RGB images, depth maps, and surface normal maps from arbitrary…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
