JOPP-3D: Joint Open Vocabulary Semantic Segmentation on Point Clouds and Panoramas

Sandeep Inuganti; Hideaki Kanayama; Kanta Shimizu; Mahdi Chamseddine; Soichiro Yokota; Didier Stricker; Jason Rambach

arXiv:2603.06168·cs.CV·March 13, 2026

JOPP-3D: Joint Open Vocabulary Semantic Segmentation on Point Clouds and Panoramas

Sandeep Inuganti, Hideaki Kanayama, Kanta Shimizu, Mahdi Chamseddine, Soichiro Yokota, Didier Stricker, Jason Rambach

PDF

Open Access

TL;DR

JOPP-3D introduces a novel framework for open-vocabulary semantic segmentation that jointly processes panoramic images and 3D point clouds, enabling language-based scene understanding across multiple visual modalities.

Contribution

The paper presents a new method that aligns vision-language features across panoramic and point cloud data for semantic segmentation with open vocabulary capabilities.

Findings

01

Achieves state-of-the-art performance on Stanford-2D-3D-s and ToF-360 datasets.

02

Enables natural language queries to generate semantic masks across modalities.

03

Significantly outperforms existing methods in open and closed vocabulary segmentation.

Abstract

Semantic segmentation across visual modalities such as 3D point clouds and panoramic images remains a challenging task, primarily due to the scarcity of annotated data and the limited adaptability of fixed-label models. In this paper, we present JOPP-3D, an open-vocabulary semantic segmentation framework that jointly leverages panoramic and point cloud data to enable language-driven scene understanding. We convert RGB-D panoramic images into their corresponding tangential perspective images and 3D point clouds, then use these modalities to extract and align foundational vision-language features. This allows natural language querying to generate semantic masks on both input modalities. Experimental evaluation on the Stanford-2D-3D-s and ToF-360 datasets demonstrates the capability of JOPP-3D to produce coherent and semantically meaningful segmentations across panoramic and 3D domains.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · 3D Shape Modeling and Analysis · Advanced Neural Network Applications