Solving Zero-Shot 3D Visual Grounding as Constraint Satisfaction Problems
Qihao Yuan, Kailai Li, Jiaming Zhang

TL;DR
This paper introduces CSVG, a zero-shot 3D visual grounding method that reformulates the task as a constraint satisfaction problem, enabling symbolic reasoning and improved accuracy over existing zero-shot approaches.
Contribution
The work presents a novel zero-shot framework that models 3D visual grounding as a CSP, allowing flexible handling of complex queries and demonstrating superior performance on standard datasets.
Findings
Achieves +7.0% accuracy on ScanRefer dataset
Achieves +11.2% accuracy on Nr3D dataset
Effectively handles negation and counting queries
Abstract
3D visual grounding (3DVG) aims to locate objects in a 3D scene with natural language descriptions. Supervised methods have achieved decent accuracy, but have a closed vocabulary and limited language understanding ability. Zero-shot methods utilize large language models (LLMs) to handle natural language descriptions, where the LLM either produces grounding results directly or generates programs that compute results (symbolically). In this work, we propose a zero-shot method that reformulates the 3DVG task as a Constraint Satisfaction Problem (CSP), where the variables and constraints represent objects and their spatial relations, respectively. This allows a global symbolic reasoning of all relevant objects, producing grounding results of both the target and anchor objects. Moreover, we demonstrate the flexibility of our framework by handling negation- and counting-based queries with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsConstraint Satisfaction and Optimization · Data Visualization and Analytics · Scheduling and Timetabling Solutions
