DenseGrounding: Improving Dense Language-Vision Semantics for Ego-Centric 3D Visual Grounding
Henry Zheng, Hao Shi, Qihang Peng, Yong Xien Chng, Rui Huang, Yepeng, Weng, Zhongchao Shi, Gao Huang

TL;DR
DenseGrounding enhances ego-centric 3D visual grounding by improving visual and textual semantics, leading to significant accuracy gains and state-of-the-art performance validated by CVPR 2024 awards.
Contribution
The paper introduces DenseGrounding, a novel method that improves dense semantic understanding in 3D visual grounding through hierarchical scene and language semantic enhancement.
Findings
Achieves over 5.8% accuracy improvement on full dataset
Outperforms existing methods in 3D visual grounding tasks
Wins CVPR 2024 Autonomous Grand Challenge Innovation Award
Abstract
Enabling intelligent agents to comprehend and interact with 3D environments through natural language is crucial for advancing robotics and human-computer interaction. A fundamental task in this field is ego-centric 3D visual grounding, where agents locate target objects in real-world 3D spaces based on verbal descriptions. However, this task faces two significant challenges: (1) loss of fine-grained visual semantics due to sparse fusion of point clouds with ego-centric multi-view images, (2) limited textual semantic context due to arbitrary language descriptions. We propose DenseGrounding, a novel approach designed to address these issues by enhancing both visual and textual semantics. For visual features, we introduce the Hierarchical Scene Semantic Enhancer, which retains dense semantics by capturing fine-grained global scene features and facilitating cross-modal alignment. For text…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications
