VPOcc: Exploiting Vanishing Point for 3D Semantic Occupancy Prediction

Junsu Kim; Junhee Lee; Ukcheol Shin; Jean Oh; Kyungdon Joo

arXiv:2408.03551·cs.CV·August 15, 2025

VPOcc: Exploiting Vanishing Point for 3D Semantic Occupancy Prediction

Junsu Kim, Junhee Lee, Ukcheol Shin, Jean Oh, Kyungdon Joo

PDF

Open Access 3 Models 1 Datasets

TL;DR

VPOcc introduces a novel framework that leverages vanishing point information to improve 3D semantic occupancy prediction from 2D images, addressing perspective-induced scale discrepancies for better scene understanding.

Contribution

The paper proposes VPOcc, a framework that uses vanishing point-based modules for perspective-aware image warping and feature aggregation, enhancing 3D scene prediction accuracy.

Findings

01

Achieved higher IoU and mIoU on SemanticKITTI and SSCBench-KITTI360 datasets.

02

Demonstrated effectiveness of vanishing point-guided modules in mitigating perspective distortion.

03

Improved 3D semantic occupancy prediction accuracy over baseline methods.

Abstract

Understanding 3D scenes semantically and spatially is crucial for the safe navigation of robots and autonomous vehicles, aiding obstacle avoidance and accurate trajectory planning. Camera-based 3D semantic occupancy prediction, which infers complete voxel grids from 2D images, is gaining importance in robot vision for its resource efficiency compared to 3D sensors. However, this task inherently suffers from a 2D-3D discrepancy, where objects of the same size in 3D space appear at different scales in a 2D image depending on their distance from the camera due to perspective projection. To tackle this issue, we propose a novel framework called VPOcc that leverages a vanishing point (VP) to mitigate the 2D-3D discrepancy at both the pixel and feature levels. As a pixel-level solution, we introduce a VPZoomer module, which warps images by counteracting the perspective effect using a VP-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

joonsu0109/vpocc-vanishing-points
dataset· 8 dl
8 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · AI in cancer detection · Video Analysis and Summarization