IGLOSS: Image Generation for Lidar Open-vocabulary Semantic Segmentation

Nermin Samet; Gilles Puy; Renaud Marlet

arXiv:2604.01361·cs.CV·April 3, 2026

IGLOSS: Image Generation for Lidar Open-vocabulary Semantic Segmentation

Nermin Samet, Gilles Puy, Renaud Marlet

PDF

1 Repo

TL;DR

This paper introduces IGLOSS, a novel zero-shot 3D lidar semantic segmentation method that uses generated prototype images from text to improve open-vocabulary performance, surpassing existing models.

Contribution

The method leverages image generation from text to bridge modality gaps, enabling effective zero-shot segmentation without relying on traditional vision-language models.

Findings

01

Achieves state-of-the-art results on nuScenes and SemanticKITTI datasets.

02

Uses image generation from text to create prototypes for matching 3D point features.

03

Provides code, models, and images at the specified GitHub repository.

Abstract

This paper presents a new method for the zero-shot open-vocabulary semantic segmentation (OVSS) of 3D automotive lidar data. To circumvent the recognized image-text modality gap that is intrinsic to approaches based on Vision Language Models (VLMs) such as CLIP, our method relies instead on image generation from text, to create prototype images. Given a 3D network distilled from a 2D Vision Foundation Model (VFM), we then label a point cloud by matching 3D point features with 2D image features of these prototypes. Our method is state-of-the-art for OVSS on nuScenes and SemanticKITTI. Code, pre-trained models, and generated images are available at https://github.com/valeoai/IGLOSS.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

valeoai/IGLOSS
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.