A Survey on Text-guided 3D Visual Grounding: Elements, Recent Advances,   and Future Directions

Daizong Liu; Yang Liu; Wencan Huang; Wei Hu

arXiv:2406.05785·cs.CV·July 23, 2024·1 cites

A Survey on Text-guided 3D Visual Grounding: Elements, Recent Advances, and Future Directions

Daizong Liu, Yang Liu, Wencan Huang, Wei Hu

PDF

Open Access 1 Repo 1 Models

TL;DR

This survey comprehensively reviews the progress, challenges, and future directions of text-guided 3D visual grounding, a task that locates objects in 3D scenes based on language queries, highlighting its importance and recent advances.

Contribution

First systematic survey providing an overview of T-3DVG, including pipeline structure, approaches, datasets, evaluation metrics, and future research directions.

Findings

01

Summarizes recent research advances in T-3DVG.

02

Analyzes strengths and weaknesses of existing approaches.

03

Discusses benchmark datasets and evaluation metrics.

Abstract

Text-guided 3D visual grounding (T-3DVG), which aims to locate a specific object that semantically corresponds to a language query from a complicated 3D scene, has drawn increasing attention in the 3D research community over the past few years. Compared to 2D visual grounding, this task presents great potential and challenges due to its closer proximity to the real world and the complexity of data collection and 3D point cloud source processing. In this survey, we attempt to provide a comprehensive overview of the T-3DVG progress, including its fundamental elements, recent research advances, and future research directions. To the best of our knowledge, this is the first systematic survey on the T-3DVG task. Specifically, we first provide a general structure of the T-3DVG pipeline with detailed components in a tutorial style, presenting a complete background overview. Then, we summarize…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

liudaizong/awesome-3d-visual-grounding
noneOfficial

Models

🤗
linhuixiao/Awesome-Visual-Grounding
model· ♡ 1
♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Multimodal Machine Learning Applications · Human Motion and Animation