3DAlign-DAER: Dynamic Attention Policy and Efficient Retrieval Strategy for Fine-grained 3D-Text Alignment at Scale

Yijia Fan; Jusheng Zhang; Kaitong Cai; Jing Yang; Jian Wang; Keze Wang

arXiv:2511.13211·cs.CV·April 27, 2026

3DAlign-DAER: Dynamic Attention Policy and Efficient Retrieval Strategy for Fine-grained 3D-Text Alignment at Scale

Yijia Fan, Jusheng Zhang, Kaitong Cai, Jing Yang, Jian Wang, Keze Wang

PDF

1 Repo 1 Video

TL;DR

3DAlign-DAER introduces a novel framework with dynamic attention and efficient retrieval for improved fine-grained 3D-text alignment, scalable to large datasets.

Contribution

It proposes a unified approach combining dynamic attention policy and efficient retrieval to enhance 3D-text alignment performance at scale.

Findings

01

Outperforms traditional methods like KNN in accuracy and efficiency.

02

Constructed Align3D-2M, a large-scale dataset with 2 million text-3D pairs.

03

Demonstrates superior results on multiple benchmarks.

Abstract

Despite recent advancements in 3D-text cross-modal alignment, existing state-of-the-art methods still struggle to align fine-grained textual semantics with detailed geometric structures, and their alignment performance degrades significantly when scaling to large-scale 3D databases. To overcome this limitation, we introduce 3DAlign-DAER, a unified framework designed to align text and 3D geometry via the proposed dynamic attention policy and the efficient retrieval strategy, capturing subtle correspondences for diverse cross-modal retrieval and classification tasks. Specifically, during the training, our proposed dynamic attention policy (DAP) employs the Hierarchical Attention Fusion (HAF) module to represent the alignment as learnable fine-grained token-to-point attentions. To optimize these attentions across different tasks and geometric hierarchies, our DAP further exploits the Monte…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

waltstephen/Cost-Effective-Communication
github

Videos

3DAlign-DAER: Dynamic Attention Policy and Efficient Retrieval Strategy for Fine-grained 3D-Text Alignment at Scale· underline