Disc3D: Automatic Curation of High-Quality 3D Dialog Data via Discriminative Object Referring

Siyuan Wei; Chunjie Wang; Xiao Liu; Xiaosheng Yan; Zhishan Zhou; Rui Huang

arXiv:2511.18817·cs.CV·January 21, 2026

Disc3D: Automatic Curation of High-Quality 3D Dialog Data via Discriminative Object Referring

Siyuan Wei, Chunjie Wang, Xiao Liu, Xiaosheng Yan, Zhishan Zhou, Rui Huang

PDF

Open Access

TL;DR

Disc3D introduces an automated pipeline that creates high-quality, unambiguous 3D scene-dialogue datasets by combining rule-based methods with large language models, significantly reducing annotation costs.

Contribution

It presents a fully automated, scalable pipeline for generating high-quality 3D dialogue data, addressing viewpoint and object referring ambiguities without human intervention.

Findings

01

Training with Disc3D improves benchmark performance

02

Produces over 2 million diverse 3D dialogue samples

03

Enhances 3D MLLMs across multiple tasks

Abstract

3D Multi-modal Large Language Models (MLLMs) still lag behind their 2D peers, largely because large-scale, high-quality 3D scene-dialogue datasets remain scarce. Prior efforts hinge on expensive human annotation and leave two key ambiguities unresolved: viewpoint ambiguity, where spatial language presumes unknown camera poses, and object referring ambiguity, where non-exclusive descriptions blur the line between targets and distractors. We therefore present a fully automated pipeline that converts raw 3D scans into unambiguous, high-quality dialogue data at a fraction of the previous cost. By synergizing rule-based constraints with 2D MLLMs and LLMs, the pipeline enables controllable, scalable generation without human intervention. The pipeline comprises four stages: (1) meta-annotation collection harvesting object-, frame-, and scene-level captions, (2) scene graph construction with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Speech and dialogue systems