OC-SOP: Enhancing Vision-Based 3D Semantic Occupancy Prediction by Object-Centric Awareness
Helin Cao, Sven Behnke

TL;DR
OC-SOP introduces an object-centric approach to improve 3D semantic occupancy prediction from images, significantly enhancing foreground object accuracy and achieving state-of-the-art results in autonomous driving perception.
Contribution
The paper presents a novel object-centric framework that integrates high-level object cues into semantic occupancy prediction, addressing limitations of local feature reliance.
Findings
Achieves state-of-the-art performance on SemanticKITTI
Significantly improves foreground object prediction accuracy
Effectively handles occlusions and incomplete scene data
Abstract
Autonomous driving perception faces significant challenges due to occlusions and incomplete scene data in the environment. To overcome these issues, the task of semantic occupancy prediction (SOP) is proposed, which aims to jointly infer both the geometry and semantic labels of a scene from images. However, conventional camera-based methods typically treat all categories equally and primarily rely on local features, leading to suboptimal predictions, especially for dynamic foreground objects. To address this, we propose Object-Centric SOP (OC-SOP), a framework that integrates high-level object-centric cues extracted via a detection branch into the semantic occupancy prediction pipeline. This object-centric integration significantly enhances the prediction accuracy for foreground objects and achieves state-of-the-art performance among all categories on SemanticKITTI.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
