Generalizing Visual Geometry Priors to Sparse Gaussian Occupancy Prediction

Changqing Zhou; Yueru Luo; Changhao Chen

arXiv:2602.21552·cs.CV·February 26, 2026

Generalizing Visual Geometry Priors to Sparse Gaussian Occupancy Prediction

Changqing Zhou, Yueru Luo, Changhao Chen

PDF

Open Access

TL;DR

This paper introduces GPOcc, a novel framework that leverages generalizable visual geometry priors for improved 3D occupancy prediction from monocular images, significantly enhancing accuracy and efficiency.

Contribution

GPOcc extends visual geometry priors to volumetric occupancy prediction using Gaussian primitives and introduces a streaming update strategy for real-time applications.

Findings

01

GPOcc improves mIoU by +9.99 in monocular setting.

02

GPOcc achieves +11.79 mIoU in streaming setting.

03

GPOcc runs 2.65× faster than prior methods.

Abstract

Accurate 3D scene understanding is essential for embodied intelligence, with occupancy prediction emerging as a key task for reasoning about both objects and free space. Existing approaches largely rely on depth priors (e.g., DepthAnything) but make only limited use of 3D cues, restricting performance and generalization. Recently, visual geometry models such as VGGT have shown strong capability in providing rich 3D priors, but similar to monocular depth foundation models, they still operate at the level of visible surfaces rather than volumetric interiors, motivating us to explore how to more effectively leverage these increasingly powerful geometry priors for 3D occupancy prediction. We present GPOcc, a framework that leverages generalizable visual geometry priors (GPs) for monocular occupancy prediction. Our method extends surface points inward along camera rays to generate volumetric…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Advanced Vision and Imaging · Face recognition and analysis