DEFA: Efficient Deformable Attention Acceleration via Pruning-Assisted Grid-Sampling and Multi-Scale Parallel Processing
Yansong Xu, Dongxu Lyu, Zhenyu Li, Zilong Wang, Yuzhou Chen, Gang, Wang, Zhican Wang, Haomin Li, Guanghui He

TL;DR
DEFA is a novel algorithm-architecture co-design that accelerates multi-scale deformable attention by pruning and parallel processing, significantly improving speed and energy efficiency on benchmarks.
Contribution
It introduces the first dedicated acceleration method for MSDeformAttn, combining pruning strategies and multi-scale parallelism to reduce memory and increase throughput.
Findings
Achieves 10.1-31.9x speedup over GPUs.
Reduces memory footprint by over 80%.
Boosts energy efficiency by up to 37.7x.
Abstract
Multi-scale deformable attention (MSDeformAttn) has emerged as a key mechanism in various vision tasks, demonstrating explicit superiority attributed to multi-scale grid-sampling. However, this newly introduced operator incurs irregular data access and enormous memory requirement, leading to severe PE underutilization. Meanwhile, existing approaches for attention acceleration cannot be directly applied to MSDeformAttn due to lack of support for this distinct procedure. Therefore, we propose a dedicated algorithm-architecture co-design dubbed DEFA, the first-of-its-kind method for MSDeformAttn acceleration. At the algorithm level, DEFA adopts frequency-weighted pruning and probability-aware pruning for feature maps and sampling points respectively, alleviating the memory footprint by over 80%. At the architecture level, it explores the multi-scale parallelism to boost the throughput…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCCD and CMOS Imaging Sensors · Advanced Memory and Neural Computing · Medical Imaging Techniques and Applications
