LOC: A General Language-Guided Framework for Open-Set 3D Occupancy Prediction

Yuhang Gao; Xiang Xiang; Sheng Zhong; Guoyou Wang

arXiv:2510.22141·cs.CV·October 28, 2025

LOC: A General Language-Guided Framework for Open-Set 3D Occupancy Prediction

Yuhang Gao, Xiang Xiang, Sheng Zhong, Guoyou Wang

PDF

TL;DR

LOC is a versatile language-guided framework for open-set 3D occupancy prediction that combines multi-modal data, contrastive learning, and semantic reasoning to improve scene understanding and unknown object recognition.

Contribution

The paper introduces LOC, a novel framework that integrates language guidance, multi-frame LiDAR fusion, and contrastive learning for open-set 3D occupancy prediction.

Findings

01

Achieves high-precision predictions for known classes.

02

Effectively distinguishes unknown classes without extra training.

03

Demonstrates superior performance on nuScenes dataset.

Abstract

Vision-Language Models (VLMs) have shown significant progress in open-set challenges. However, the limited availability of 3D datasets hinders their effective application in 3D scene understanding. We propose LOC, a general language-guided framework adaptable to various occupancy networks, supporting both supervised and self-supervised learning paradigms. For self-supervised tasks, we employ a strategy that fuses multi-frame LiDAR points for dynamic/static scenes, using Poisson reconstruction to fill voids, and assigning semantics to voxels via K-Nearest Neighbor (KNN) to obtain comprehensive voxel representations. To mitigate feature over-homogenization caused by direct high-dimensional feature distillation, we introduce Densely Contrastive Learning (DCL). DCL leverages dense voxel semantic information and predefined textual prompts. This efficiently enhances open-set recognition…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.