Open-vocabulary 3D scene perception in industrial environments
Keno Moenck, Adrian Philip Florea, Julian Koch, Thorsten Sch\"uppstuhl

TL;DR
This paper introduces a training-free open-vocabulary 3D perception pipeline tailored for industrial environments, overcoming generalization issues of existing models by using superpoints and domain-adapted VLFM for effective industrial object segmentation.
Contribution
The work presents a novel, training-free 3D perception method that leverages superpoints and domain-specific VLFM to improve industrial object segmentation without relying on pre-trained class-agnostic models.
Findings
Successful segmentation of industrial objects in 3D scenes
Demonstrated limitations of existing models on industrial data
Effective use of domain-adapted VLFM for open-vocabulary querying
Abstract
Autonomous vision applications in production, intralogistics, or manufacturing environments require perception capabilities beyond a small, fixed set of classes. Recent open-vocabulary methods, leveraging 2D Vision-Language Foundation Models (VLFMs), target this task but often rely on class-agnostic segmentation models pre-trained on non-industrial datasets (e.g., household scenes). In this work, we first demonstrate that such models fail to generalize, performing poorly on common industrial objects. Therefore, we propose a training-free, open-vocabulary 3D perception pipeline that overcomes this limitation. Instead of using a pre-trained model to generate instance proposals, our method simply generates masks by merging pre-computed superpoints based on their semantic features. Following, we evaluate the domain-adapted VLFM "IndustrialCLIP" on a representative 3D industrial workshop…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Robotics and Sensor-Based Localization
