PD-APE: A Parallel Decoding Framework with Adaptive Position Encoding for 3D Visual Grounding
Chenshu Hou, Liang Peng, Xiaopei Wu, Xiaofei He, Wenxiao Wang

TL;DR
This paper introduces PD-APE, a dual-branch decoding framework with adaptive position encoding for 3D visual grounding, effectively separating target object and environment understanding to improve accuracy.
Contribution
The proposed PD-APE framework uniquely decouples object and environment decoding with adaptive position encoding, enhancing focus and performance in 3D visual grounding tasks.
Findings
Outperforms state-of-the-art on ScanRefer dataset
Achieves superior results on Nr3D dataset
Demonstrates effective separation of object and environment attention
Abstract
3D visual grounding aims to identify objects in 3D point cloud scenes that match specific natural language descriptions. This requires the model to not only focus on the target object itself but also to consider the surrounding environment to determine whether the descriptions are met. Most previous works attempt to accomplish both tasks within the same module, which can easily lead to a distraction of attention. To this end, we propose PD-APE, a dual-branch decoding framework that separately decodes target object attributes and surrounding layouts. Specifically, in the target object branch, the decoder processes text tokens that describe features of the target object (e.g., category and color), guiding the queries to pay attention to the target object itself. In the surrounding branch, the queries align with other text tokens that carry surrounding environment information, making the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization · Robotics and Automated Systems
MethodsSoftmax · Attention Is All You Need · Focus · ALIGN
