Accelerating Multi-Scale Deformable Attention Using Near-Memory-Processing Architecture
Huize Li, Qinggang Wang, Bing Gao, Dan Chen, Yu Huang, Xin Xin

TL;DR
This paper introduces DANMP, a specialized near-memory processing architecture for multi-scale deformable attention in vision tasks, significantly improving speed and energy efficiency over GPUs by addressing irregular memory access and workload imbalance.
Contribution
DANMP combines hardware and software innovations, including non-uniform NMP integration and clustering-based data reuse, to accelerate MSDAttn efficiently on near-memory processing architectures.
Findings
97.43x speedup over NVIDIA A6000 GPU
208.47x energy efficiency improvement
Effective handling of irregular sampling patterns
Abstract
Multi Scale Deformable Attention (MSDAttn) has become a fundamental component in various vision tasks due to its effective multi scale grid sampling (MSGS). However, its reliance on random sampling results in highly irregular memory access patterns, making it a memory intensive operation inefficient for GPUs. Near memory processing (NMP) offers a promising solution for accelerating memory bound kernels, yet existing NMP based attention accelerators remain suboptimal for MSDAttn due to incompatible load balancing and data reuse strategies. Specifically, current NMP solutions uniformly distribute processing elements (PEs) across all banks, leading to significant PE underutilization and excessive cross bank data transfers. Moreover, most rely on locality based reuse, which fails under MSDAttn's unpredictable sampling patterns. To address these challenges, this paper presents DANMP, a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Parallel Computing and Optimization Techniques · Big Data and Digital Economy
