IVGT: Implicit Visual Geometry Transformer for Neural Scene Representation

Yuqi Wu; Tianyu Hu; Wenzhao Zheng; Yuanhui Huang; Haowen Sun; Jie Zhou; Jiwen Lu

arXiv:2605.16258·cs.CV·May 22, 2026

IVGT: Implicit Visual Geometry Transformer for Neural Scene Representation

Yuqi Wu, Tianyu Hu, Wenzhao Zheng, Yuanhui Huang, Haowen Sun, Jie Zhou, Jiwen Lu

PDF

TL;DR

IVGT introduces an implicit neural scene representation that models continuous 3D geometry from pose-free multi-view images, enabling high-quality reconstruction and rendering from arbitrary viewpoints.

Contribution

The paper presents IVGT, a novel implicit transformer-based model that predicts continuous 3D geometry without explicit pointmaps, improving coherence and generalization across scenes.

Findings

01

IVGT achieves state-of-the-art results in mesh and point cloud reconstruction.

02

It enables high-quality novel view synthesis and surface normal estimation.

03

The model generalizes well across diverse scenes and tasks.

Abstract

Reconstructing coherent 3D geometry and appearance from unposed multi-view images is a fundamental yet challenging problem in computer vision. Most existing visual geometry foundation models predict explicit geometry by regressing pixel-aligned pointmaps, often suffering from redundancy and limited geometric continuity. We propose IVGT, an Implicit Visual Geometry Transformer that implicitly models continuous and coherent geometry from pose-free multi-view images. This formulation learns a continuous neural scene representation in a canonical coordinate system and supports continuous spatial queries at any 3D positions, retrieving local features to predict signed distance (SDF) values and colors using lightweight decoders. It allows direct extraction of continuous and coherent surface geometry, enabling rendering of RGB images, depth maps, and surface normal maps from arbitrary…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.