OC-SOP: Enhancing Vision-Based 3D Semantic Occupancy Prediction by Object-Centric Awareness

Helin Cao; Sven Behnke

arXiv:2506.18798·cs.CV·August 14, 2025

OC-SOP: Enhancing Vision-Based 3D Semantic Occupancy Prediction by Object-Centric Awareness

Helin Cao, Sven Behnke

PDF

TL;DR

OC-SOP introduces an object-centric approach to improve 3D semantic occupancy prediction from images, significantly enhancing foreground object accuracy and achieving state-of-the-art results in autonomous driving perception.

Contribution

The paper presents a novel object-centric framework that integrates high-level object cues into semantic occupancy prediction, addressing limitations of local feature reliance.

Findings

01

Achieves state-of-the-art performance on SemanticKITTI

02

Significantly improves foreground object prediction accuracy

03

Effectively handles occlusions and incomplete scene data

Abstract

Autonomous driving perception faces significant challenges due to occlusions and incomplete scene data in the environment. To overcome these issues, the task of semantic occupancy prediction (SOP) is proposed, which aims to jointly infer both the geometry and semantic labels of a scene from images. However, conventional camera-based methods typically treat all categories equally and primarily rely on local features, leading to suboptimal predictions, especially for dynamic foreground objects. To address this, we propose Object-Centric SOP (OC-SOP), a framework that integrates high-level object-centric cues extracted via a detection branch into the semantic occupancy prediction pipeline. This object-centric integration significantly enhances the prediction accuracy for foreground objects and achieves state-of-the-art performance among all categories on SemanticKITTI.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.