3D-AVS: LiDAR-based 3D Auto-Vocabulary Segmentation
Weijie Wei, Osman \"Ulger, Fatemeh Karimi Nejadasl, Theo Gevers,, Martin R. Oswald

TL;DR
3D-AVS introduces an auto-vocabulary segmentation method for 3D point clouds that generates semantic categories at runtime, eliminating the need for human-provided labels and enabling richer, scalable annotations.
Contribution
It proposes a novel auto-vocabulary segmentation approach for 3D point clouds that combines image and LiDAR data, with a new metric for evaluating unknown vocabularies.
Findings
Effective segmentation on nuScenes and ScanNet200 datasets.
Generates accurate semantic classes without human labels.
Enhances robustness under challenging lighting conditions.
Abstract
Open-Vocabulary Segmentation (OVS) methods offer promising capabilities in detecting unseen object categories, but the category must be known and needs to be provided by a human, either via a text prompt or pre-labeled datasets, thus limiting their scalability. We propose 3D-AVS, a method for Auto-Vocabulary Segmentation of 3D point clouds for which the vocabulary is unknown and auto-generated for each input at runtime, thus eliminating the human in the loop and typically providing a substantially larger vocabulary for richer annotations. 3D-AVS first recognizes semantic entities from image or point cloud data and then segments all points with the automatically generated vocabulary. Our method incorporates both image-based and point-based recognition, enhancing robustness under challenging lighting conditions where geometric information from LiDAR is especially valuable. Our point-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Surveying and Cultural Heritage · Advanced Image and Video Retrieval Techniques · Image Processing and 3D Reconstruction
MethodsAttention Pooling
