ViPOcc: Leveraging Visual Priors from Vision Foundation Models for   Single-View 3D Occupancy Prediction

Yi Feng; Yu Han; Xijing Zhang; Tanghui Li; Yanting Zhang; Rui Fan

arXiv:2412.11210·cs.CV·January 13, 2025

ViPOcc: Leveraging Visual Priors from Vision Foundation Models for Single-View 3D Occupancy Prediction

Yi Feng, Yu Han, Xijing Zhang, Tanghui Li, Yanting Zhang, Rui Fan

PDF

Open Access 1 Repo 1 Video

TL;DR

ViPOcc introduces a novel approach leveraging vision foundation models for precise, instance-aware 3D occupancy prediction from a single image, improving accuracy and consistency in autonomous driving scenarios.

Contribution

The paper proposes ViPOcc, integrating visual priors, a metric depth estimation branch, and a semantic-guided sampling method for enhanced 3D scene understanding.

Findings

01

Outperforms existing methods on KITTI datasets

02

Achieves superior depth estimation accuracy

03

Provides consistent 3D occupancy predictions

Abstract

Inferring the 3D structure of a scene from a single image is an ill-posed and challenging problem in the field of vision-centric autonomous driving. Existing methods usually employ neural radiance fields to produce voxelized 3D occupancy, lacking instance-level semantic reasoning and temporal photometric consistency. In this paper, we propose ViPOcc, which leverages the visual priors from vision foundation models (VFMs) for fine-grained 3D occupancy prediction. Unlike previous works that solely employ volume rendering for RGB and depth image reconstruction, we introduce a metric depth estimation branch, in which an inverse depth alignment module is proposed to bridge the domain gap in depth distribution between VFM predictions and the ground truth. The recovered metric depth is then utilized in temporal photometric alignment and spatial geometric alignment to ensure accurate and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fengyi233/ViPOcc
pytorchOfficial

Videos

ViPOcc: Leveraging Visual Priors from Vision Foundation Models for Single-View 3D Occupancy Prediction· underline

Taxonomy

TopicsAdvanced Vision and Imaging · Computer Graphics and Visualization Techniques · 3D Shape Modeling and Analysis