Reasoning Matters for 3D Visual Grounding
Hsiang-Wei Huang, Kuang-Ming Chen, Wenhao Chai, Cheng-Yen Yang, Jen-Hao Cheng, Jenq-Neng Hwang

TL;DR
This paper introduces a new data pipeline for 3D visual grounding that automatically synthesizes training data with reasoning processes, enabling the fine-tuning of a large language model that outperforms previous methods with significantly less data.
Contribution
The authors propose a novel data synthesis pipeline for 3D visual grounding and develop Reason3DVG-8B, a powerful LLM that leverages this data to improve performance efficiently.
Findings
Reason3DVG-8B outperforms 3D-GRAND using only 1.6% of its training data.
Synthetic data with reasoning enhances 3D visual grounding performance.
The approach reduces data collection costs while improving model capabilities.
Abstract
The recent development of Large Language Models (LLMs) with strong reasoning ability has driven research in various domains such as mathematics, coding, and scientific discovery. Meanwhile, 3D visual grounding, as a fundamental task in 3D understanding, still remains challenging due to the limited reasoning ability of recent 3D visual grounding models. Most of the current methods incorporate a text encoder and visual feature encoder to generate cross-modal fuse features and predict the referring object. These models often require supervised training on extensive 3D annotation data. On the other hand, recent research also focus on scaling synthetic data to train stronger 3D visual grounding LLM, however, the performance gain remains limited and non-proportional to the data collection cost. In this work, we propose a 3D visual grounding data pipeline, which is capable of automatically…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Advanced Graph Neural Networks
