LOSC: LiDAR Open-voc Segmentation Consolidator
Nermin Samet, Gilles Puy, Renaud Marlet

TL;DR
LOSC introduces a method that refines noisy 3D lidar labels using spatio-temporal consistency and augmentation robustness, enabling a 3D network to achieve state-of-the-art open-vocabulary segmentation in driving scenarios.
Contribution
The paper presents LOSC, a novel label consolidation approach that improves open-vocabulary lidar segmentation by combining label refinement with training a 3D network, outperforming existing methods.
Findings
LOSC achieves state-of-the-art zero-shot segmentation on nuScenes and SemanticKITTI.
Refined labels improve the robustness and accuracy of 3D lidar segmentation.
The method significantly outperforms previous approaches in open-vocabulary settings.
Abstract
We study the use of image-based Vision-Language Models (VLMs) for open-vocabulary segmentation of lidar scans in driving settings. Classically, image semantics can be back-projected onto 3D point clouds. Yet, resulting point labels are noisy and sparse. We consolidate these labels to enforce both spatio-temporal consistency and robustness to image-level augmentations. We then train a 3D network based on these refined labels. This simple method, called LOSC, outperforms the SOTA of zero-shot open-vocabulary semantic and panoptic segmentation on both nuScenes and SemanticKITTI, with significant margins. Code is available at https://github.com/valeoai/LOSC.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Robotics and Sensor-Based Localization · Advanced Neural Network Applications
