Fully Sparse 3D Occupancy Prediction
Haisong Liu, Yang Chen, Haiguang Wang, Zetong Yang, Tianyu Li, Jia, Zeng, Li Chen, Hongyang Li, Limin Wang

TL;DR
This paper introduces SparseOcc, a fully sparse 3D occupancy prediction network that efficiently reconstructs and predicts scene occupancy from camera inputs, achieving real-time performance and improved accuracy with multiple frames.
Contribution
The paper presents a novel sparse occupancy network with mask-guided sparse sampling and a new RayIoU metric, reducing computational costs and improving 3D scene understanding.
Findings
Achieves 34.0 RayIoU with real-time 17.3 FPS
Improves to 35.1 RayIoU with 15 frames
Outperforms dense methods in efficiency and accuracy
Abstract
Occupancy prediction plays a pivotal role in autonomous driving. Previous methods typically construct dense 3D volumes, neglecting the inherent sparsity of the scene and suffering from high computational costs. To bridge the gap, we introduce a novel fully sparse occupancy network, termed SparseOcc. SparseOcc initially reconstructs a sparse 3D representation from camera-only inputs and subsequently predicts semantic/instance occupancy from the 3D sparse representation by sparse queries. A mask-guided sparse sampling is designed to enable sparse queries to interact with 2D features in a fully sparse manner, thereby circumventing costly dense features or global attention. Additionally, we design a thoughtful ray-based evaluation metric, namely RayIoU, to solve the inconsistency penalty along the depth axis raised in traditional voxel-level mIoU criteria. SparseOcc demonstrates its…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Advanced Neural Network Applications · Optical Imaging and Spectroscopy Techniques
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
