ViewSRD: 3D Visual Grounding via Structured Multi-View Decomposition

Ronggang Huang; Haoxin Yang; Yan Cai; Xuemiao Xu; Huaidong Zhang; Shengfeng He

arXiv:2507.11261·cs.CV·July 29, 2025

ViewSRD: 3D Visual Grounding via Structured Multi-View Decomposition

Ronggang Huang, Haoxin Yang, Yan Cai, Xuemiao Xu, Huaidong Zhang, Shengfeng He

PDF

TL;DR

ViewSRD introduces a structured multi-view decomposition framework for 3D visual grounding, effectively disentangling complex multi-anchor queries and resolving spatial inconsistencies caused by perspective variations.

Contribution

It proposes a novel framework with modules for query decomposition, multi-view interaction, and reasoning, improving accuracy in complex 3D grounding tasks.

Findings

01

Outperforms state-of-the-art methods on 3D visual grounding datasets.

02

Effectively handles complex multi-anchor queries.

03

Resolves spatial inconsistencies due to perspective variations.

Abstract

3D visual grounding aims to identify and localize objects in a 3D space based on textual descriptions. However, existing methods struggle with disentangling targets from anchors in complex multi-anchor queries and resolving inconsistencies in spatial descriptions caused by perspective variations. To tackle these challenges, we propose ViewSRD, a framework that formulates 3D visual grounding as a structured multi-view decomposition process. First, the Simple Relation Decoupling (SRD) module restructures complex multi-anchor queries into a set of targeted single-anchor statements, generating a structured set of perspective-aware descriptions that clarify positional relationships. These decomposed representations serve as the foundation for the Multi-view Textual-Scene Interaction (Multi-TSI) module, which integrates textual and scene features across multiple viewpoints using shared,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSparse Evolutionary Training