A Memory-Efficient Framework for Deformable Transformer with Neural Architecture Search

Wendong Mao; Mingfan Zhao; Jianfeng Guan; Qiwei Dong; Zhongfeng Wang

arXiv:2507.11549·cs.CV·July 29, 2025

A Memory-Efficient Framework for Deformable Transformer with Neural Architecture Search

Wendong Mao, Mingfan Zhao, Jianfeng Guan, Qiwei Dong, Zhongfeng Wang

PDF

Open Access

TL;DR

This paper introduces a hardware-efficient framework for deformable attention transformers using neural architecture search, significantly reducing memory access and maintaining high accuracy for edge hardware deployment.

Contribution

It proposes a NAS-based slicing strategy for DAT that optimizes hardware cost and accuracy without altering the model architecture.

Findings

01

Maintains only 0.2% accuracy drop compared to baseline DAT.

02

Reduces DRAM access times to 18% on FPGA hardware.

03

Demonstrates effective hardware acceleration on edge devices.

Abstract

Deformable Attention Transformers (DAT) have shown remarkable performance in computer vision tasks by adaptively focusing on informative image regions. However, their data-dependent sampling mechanism introduces irregular memory access patterns, posing significant challenges for efficient hardware deployment. Existing acceleration methods either incur high hardware overhead or compromise model accuracy. To address these issues, this paper proposes a hardware-friendly optimization framework for DAT. First, a neural architecture search (NAS)-based method with a new slicing strategy is proposed to automatically divide the input feature into uniform patches during the inference process, avoiding memory conflicts without modifying model architecture. The method explores the optimal slice configuration by jointly optimizing hardware cost and inference accuracy. Secondly, an FPGA-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Embedded Systems Design Techniques · Image Enhancement Techniques