Fully Sparse 3D Occupancy Prediction

Haisong Liu; Yang Chen; Haiguang Wang; Zetong Yang; Tianyu Li; Jia; Zeng; Li Chen; Hongyang Li; Limin Wang

arXiv:2312.17118·cs.CV·July 22, 2024·6 cites

Fully Sparse 3D Occupancy Prediction

Haisong Liu, Yang Chen, Haiguang Wang, Zetong Yang, Tianyu Li, Jia, Zeng, Li Chen, Hongyang Li, Limin Wang

PDF

Open Access 3 Repos 1 Models

TL;DR

This paper introduces SparseOcc, a fully sparse 3D occupancy prediction network that efficiently reconstructs and predicts scene occupancy from camera inputs, achieving real-time performance and improved accuracy with multiple frames.

Contribution

The paper presents a novel sparse occupancy network with mask-guided sparse sampling and a new RayIoU metric, reducing computational costs and improving 3D scene understanding.

Findings

01

Achieves 34.0 RayIoU with real-time 17.3 FPS

02

Improves to 35.1 RayIoU with 15 frames

03

Outperforms dense methods in efficiency and accuracy

Abstract

Occupancy prediction plays a pivotal role in autonomous driving. Previous methods typically construct dense 3D volumes, neglecting the inherent sparsity of the scene and suffering from high computational costs. To bridge the gap, we introduce a novel fully sparse occupancy network, termed SparseOcc. SparseOcc initially reconstructs a sparse 3D representation from camera-only inputs and subsequently predicts semantic/instance occupancy from the 3D sparse representation by sparse queries. A mask-guided sparse sampling is designed to enable sparse queries to interact with 2D features in a fully sparse manner, thereby circumventing costly dense features or global attention. Additionally, we design a thoughtful ray-based evaluation metric, namely RayIoU, to solve the inconsistency penalty along the depth axis raised in traditional voxel-level mIoU criteria. SparseOcc demonstrates its…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
chinmaygarde/SparseBev
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Advanced Neural Network Applications · Optical Imaging and Spectroscopy Techniques

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings