OpenScan: A Benchmark for Generalized Open-Vocabulary 3D Scene Understanding

Youjun Zhao; Jiaying Lin; Shuquan Ye; Qianshi Pang; Rynson W.H. Lau

arXiv:2408.11030·cs.CV·November 26, 2025

OpenScan: A Benchmark for Generalized Open-Vocabulary 3D Scene Understanding

Youjun Zhao, Jiaying Lin, Shuquan Ye, Qianshi Pang, Rynson W.H. Lau

PDF

Open Access 1 Repo

TL;DR

OpenScan introduces a comprehensive benchmark for evaluating generalized open-vocabulary 3D scene understanding, emphasizing the need for models to interpret diverse linguistic attributes beyond object classes.

Contribution

The paper proposes the GOV-3D task and introduces the OpenScan benchmark, expanding evaluation to include fine-grained attributes like affordance, property, and material.

Findings

01

State-of-the-art OV-3D methods perform poorly on GOV-3D tasks.

02

Existing methods struggle with abstract vocabulary understanding.

03

Scaling object classes alone is insufficient for scene comprehension.

Abstract

Open-vocabulary 3D scene understanding (OV-3D) aims to localize and classify novel objects beyond the closed set of object classes. However, existing approaches and benchmarks primarily focus on the open vocabulary problem within the context of object classes, which is insufficient in providing a holistic evaluation to what extent a model understands the 3D scene. In this paper, we introduce a more challenging task called Generalized Open-Vocabulary 3D Scene Understanding (GOV-3D) to explore the open vocabulary problem beyond object classes. It encompasses an open and diverse set of generalized knowledge, expressed as linguistic queries of fine-grained and object-specific attributes. To this end, we contribute a new benchmark named \textit{OpenScan}, which consists of 3D object attributes across eight representative linguistic aspects, including affordance, property, and material. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

youjunzhao/openscan
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Human Pose and Action Recognition

MethodsSparse Evolutionary Training · Focus