JOPP-3D: Joint Open Vocabulary Semantic Segmentation on Point Clouds and Panoramas
Sandeep Inuganti, Hideaki Kanayama, Kanta Shimizu, Mahdi Chamseddine, Soichiro Yokota, Didier Stricker, Jason Rambach

TL;DR
JOPP-3D introduces a novel framework for open-vocabulary semantic segmentation that jointly processes panoramic images and 3D point clouds, enabling language-based scene understanding across multiple visual modalities.
Contribution
The paper presents a new method that aligns vision-language features across panoramic and point cloud data for semantic segmentation with open vocabulary capabilities.
Findings
Achieves state-of-the-art performance on Stanford-2D-3D-s and ToF-360 datasets.
Enables natural language queries to generate semantic masks across modalities.
Significantly outperforms existing methods in open and closed vocabulary segmentation.
Abstract
Semantic segmentation across visual modalities such as 3D point clouds and panoramic images remains a challenging task, primarily due to the scarcity of annotated data and the limited adaptability of fixed-label models. In this paper, we present JOPP-3D, an open-vocabulary semantic segmentation framework that jointly leverages panoramic and point cloud data to enable language-driven scene understanding. We convert RGB-D panoramic images into their corresponding tangential perspective images and 3D point clouds, then use these modalities to extract and align foundational vision-language features. This allows natural language querying to generate semantic masks on both input modalities. Experimental evaluation on the Stanford-2D-3D-s and ToF-360 datasets demonstrates the capability of JOPP-3D to produce coherent and semantically meaningful segmentations across panoramic and 3D domains.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · 3D Shape Modeling and Analysis · Advanced Neural Network Applications
