OccLE: Label-Efficient 3D Semantic Occupancy Prediction
Naiyu Fang, Zheyuan Zhou, Fayao Liu, Xulei Yang, Jiacheng Wei, Lemiao Qiu, Hongsheng Li, Guosheng Lin

TL;DR
OccLE introduces a label-efficient method for 3D semantic occupancy prediction that combines semantic and geometric learning from images and LiDAR, achieving high performance with minimal voxel annotations.
Contribution
The paper proposes a novel decoupled semantic and geometric learning framework that fuses features for 3D occupancy prediction using limited annotations, enhancing efficiency and performance.
Findings
Achieves competitive results with only 10% voxel annotations.
Utilizes pseudo labels from 2D foundation models for semantic learning.
Employs semi-supervision and feature fusion for improved geometric understanding.
Abstract
3D semantic occupancy prediction offers an intuitive and efficient scene understanding and has attracted significant interest in autonomous driving perception. Existing approaches either rely on full supervision, which demands costly voxel-level annotations, or on self-supervision, which provides limited guidance and yields suboptimal performance. To address these challenges, we propose OccLE, a Label-Efficient 3D Semantic Occupancy Prediction that takes images and LiDAR as inputs and maintains high performance with limited voxel annotations. Our intuition is to decouple the semantic and geometric learning tasks and then fuse the learned feature grids from both tasks for the final semantic occupancy prediction. Therefore, the semantic branch distills 2D foundation model to provide aligned pseudo labels for 2D and 3D semantic learning. The geometric branch integrates image and LiDAR…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Natural Language Processing Techniques · Human Pose and Action Recognition
MethodsMamba: Linear-Time Sequence Modeling with Selective State Spaces
