VoxDet: Rethinking 3D Semantic Occupancy Prediction as Dense Object Detection
Wuyang Li, Zhu Yu, Alexandre Alahi

TL;DR
VoxDet introduces an instance-centric approach to 3D semantic occupancy prediction by reformulating it as dense object detection, leveraging voxel-level class labels for improved instance discrimination and achieving state-of-the-art results.
Contribution
The paper proposes VoxDet, a novel framework that converts occupancy prediction into dense detection using offset regression and semantic prediction, enhancing instance-level accuracy.
Findings
Achieves 63.0 IoU on SemanticKITTI test set.
Outperforms previous methods on both camera and LiDAR benchmarks.
Efficiently combines voxel-based detection with instance discrimination.
Abstract
3D semantic occupancy prediction aims to reconstruct the 3D geometry and semantics of the surrounding environment. With dense voxel labels, prior works typically formulate it as a dense segmentation task, independently classifying each voxel. However, this paradigm neglects critical instance-centric discriminability, leading to instance-level incompleteness and adjacent ambiguities. To address this, we highlight a free lunch of occupancy labels: the voxel-level class label implicitly provides insight at the instance level, which is overlooked by the community. Motivated by this observation, we first introduce a training-free Voxel-to-Instance (VoxNT) trick: a simple yet effective method that freely converts voxel-level class labels into instance-level offset labels. Building on this, we further propose VoxDet, an instance-centric framework that reformulates the voxel-level occupancy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · 3D Shape Modeling and Analysis · Robotics and Sensor-Based Localization
