A Survey on Text-guided 3D Visual Grounding: Elements, Recent Advances, and Future Directions
Daizong Liu, Yang Liu, Wencan Huang, Wei Hu

TL;DR
This survey comprehensively reviews the progress, challenges, and future directions of text-guided 3D visual grounding, a task that locates objects in 3D scenes based on language queries, highlighting its importance and recent advances.
Contribution
First systematic survey providing an overview of T-3DVG, including pipeline structure, approaches, datasets, evaluation metrics, and future research directions.
Findings
Summarizes recent research advances in T-3DVG.
Analyzes strengths and weaknesses of existing approaches.
Discusses benchmark datasets and evaluation metrics.
Abstract
Text-guided 3D visual grounding (T-3DVG), which aims to locate a specific object that semantically corresponds to a language query from a complicated 3D scene, has drawn increasing attention in the 3D research community over the past few years. Compared to 2D visual grounding, this task presents great potential and challenges due to its closer proximity to the real world and the complexity of data collection and 3D point cloud source processing. In this survey, we attempt to provide a comprehensive overview of the T-3DVG progress, including its fundamental elements, recent research advances, and future research directions. To the best of our knowledge, this is the first systematic survey on the T-3DVG task. Specifically, we first provide a general structure of the T-3DVG pipeline with detailed components in a tutorial style, presenting a complete background overview. Then, we summarize…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Multimodal Machine Learning Applications · Human Motion and Animation
