LOC: A General Language-Guided Framework for Open-Set 3D Occupancy Prediction
Yuhang Gao, Xiang Xiang, Sheng Zhong, Guoyou Wang

TL;DR
LOC is a versatile language-guided framework for open-set 3D occupancy prediction that combines multi-modal data, contrastive learning, and semantic reasoning to improve scene understanding and unknown object recognition.
Contribution
The paper introduces LOC, a novel framework that integrates language guidance, multi-frame LiDAR fusion, and contrastive learning for open-set 3D occupancy prediction.
Findings
Achieves high-precision predictions for known classes.
Effectively distinguishes unknown classes without extra training.
Demonstrates superior performance on nuScenes dataset.
Abstract
Vision-Language Models (VLMs) have shown significant progress in open-set challenges. However, the limited availability of 3D datasets hinders their effective application in 3D scene understanding. We propose LOC, a general language-guided framework adaptable to various occupancy networks, supporting both supervised and self-supervised learning paradigms. For self-supervised tasks, we employ a strategy that fuses multi-frame LiDAR points for dynamic/static scenes, using Poisson reconstruction to fill voids, and assigning semantics to voxels via K-Nearest Neighbor (KNN) to obtain comprehensive voxel representations. To mitigate feature over-homogenization caused by direct high-dimensional feature distillation, we introduce Densely Contrastive Learning (DCL). DCL leverages dense voxel semantic information and predefined textual prompts. This efficiently enhances open-set recognition…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
