DC-Scene: Data-Centric Learning for 3D Scene Understanding

Ting Huang; Zeyu Zhang; Ruicheng Zhang; Yang Zhao

arXiv:2505.15232·cs.CV·May 22, 2025

DC-Scene: Data-Centric Learning for 3D Scene Understanding

Ting Huang, Zeyu Zhang, Ruicheng Zhang, Yang Zhao

PDF

Open Access 1 Repo

TL;DR

DC-Scene introduces a data-centric approach for 3D scene understanding that improves training efficiency and performance by filtering high-quality data and reducing reliance on large datasets, demonstrated on ScanRefer and Nr3D.

Contribution

The paper proposes a novel CLIP-driven dual-indicator quality filter and curriculum scheduler to enhance data quality and training efficiency in 3D scene understanding.

Findings

01

Achieves state-of-the-art 86.1 CIDEr score with top-75% data subset.

02

Reduces training cost by approximately two-thirds.

03

High-quality data filtering outperforms using full datasets.

Abstract

3D scene understanding plays a fundamental role in vision applications such as robotics, autonomous driving, and augmented reality. However, advancing learning-based 3D scene understanding remains challenging due to two key limitations: (1) the large scale and complexity of 3D scenes lead to higher computational costs and slower training compared to 2D counterparts; and (2) high-quality annotated 3D datasets are significantly scarcer than those available for 2D vision. These challenges underscore the need for more efficient learning paradigms. In this work, we propose DC-Scene, a data-centric framework tailored for 3D scene understanding, which emphasizes enhancing data quality and training efficiency. Specifically, we introduce a CLIP-driven dual-indicator quality (DIQ) filter, combining vision-language alignment scores with caption-loss perplexity, along with a curriculum scheduler…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aigeeksgroup/dc-scene
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Processing and 3D Reconstruction · 3D Surveying and Cultural Heritage · Advanced Vision and Imaging

MethodsSparse Evolutionary Training