DenseScan: Advancing 3D Scene Understanding with 2D Dense Annotation

Zirui Wang; Tao Zhang

arXiv:2512.00226·cs.CV·December 2, 2025

DenseScan: Advancing 3D Scene Understanding with 2D Dense Annotation

Zirui Wang, Tao Zhang

PDF

Open Access

TL;DR

DenseScan introduces a richly annotated 3D scene dataset with multi-level descriptions and question generation, leveraging multi-view images and large language models to enhance 3D understanding tasks.

Contribution

It presents a novel automated pipeline for dense semantic annotation of 3D scenes, combining geometric and semantic information for improved visual-language applications.

Findings

01

Enhanced object-level understanding in 3D environments.

02

Improved question-answering performance over traditional datasets.

03

Broader applicability to downstream tasks like navigation and AR.

Abstract

3D understanding is a key capability for real-world AI assistance. High-quality data plays an important role in driving the development of the 3D understanding community. Current 3D scene understanding datasets often provide geometric and instance-level information, yet they lack the rich semantic annotations necessary for nuanced visual-language tasks.In this work, we introduce DenseScan, a novel dataset with detailed multi-level descriptions generated by an automated pipeline leveraging multi-view 2D images and multimodal large language models (MLLMs). Our approach enables dense captioning of scene elements, ensuring comprehensive object-level descriptions that capture context-sensitive details. Furthermore, we extend these annotations through scenario-based question generation, producing high-level queries that integrate object properties, spatial relationships, and scene context. By…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Robotics and Sensor-Based Localization · Advanced Neural Network Applications