VGGT-Occ: Geometry-Grounded and Density-Aware Gated Fusion for 3D Occupancy Prediction
Xun Chen, Tianchen Deng, Rui Wang, Fangjinhua Wang, Junyi Ma, Hongming Shen,Hesheng Wang, Danwei Wang

TL;DR
VGGT-Occ introduces a geometry-grounded, density-aware fusion framework for 3D occupancy prediction, enhancing accuracy by embedding geometric constraints throughout the process.
Contribution
The paper proposes Projection-Aware Deformable Attention and a view-quality semantic gate, integrating geometry into all attention stages and improving efficiency with a coarse-to-fine decoder.
Findings
Achieves 33.00% IoU on SurroundOcc-nuScenes with 41M parameters.
Outperforms existing methods in 3D occupancy prediction accuracy.
Reduces decoder cost while maintaining high performance.
Abstract
3D semantic occupancy prediction requires accurate 2D-to-3D feature lifting, yet current methods restrict camera geometry to initial projections. Subsequent operations like offset learning, attention weighting, and cross-camera aggregation remain geometry-agnostic, ignoring essential physical constraints. We propose VGGT-Occ, a framework that embeds geometric tokens throughout the entire pipeline. We introduce Projection-Aware Deformable Attention (PA-DA) to inject geometry into all attention stages. PA-DA projects 3D offsets back to image planes and leverages the projection Jacobian as an additive bias to suppress unreliable observations. Features are then integrated through a view-quality semantic gate for cross-view consistency. To optimize both efficiency and performance, we employ a sequential coarse-to-fine decoder with gated fusion, where low-resolution features are refined into…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
