UniForm: A Reuse Attention Mechanism Optimized for Efficient Vision   Transformers on Edge Devices

Seul-Ki Yeom; Tae-Ho Kim

arXiv:2412.02344·cs.CV·December 4, 2024

UniForm: A Reuse Attention Mechanism Optimized for Efficient Vision Transformers on Edge Devices

Seul-Ki Yeom, Tae-Ho Kim

PDF

Open Access

TL;DR

UniForm introduces a novel Reuse Attention mechanism that consolidates attention computations to significantly reduce memory and computational demands, enabling efficient deployment of vision transformers on edge devices without sacrificing accuracy.

Contribution

The paper presents Reuse Attention, a new attention mechanism that improves efficiency and scalability of vision transformers for resource-constrained edge devices.

Findings

01

Achieves 76.7% Top-1 accuracy on ImageNet-1K with 21.8ms inference on Jetson AGX Orin.

02

Outperforms existing attention mechanisms like Linear and Flash Attention in speed and memory.

03

Demonstrates versatility across GPUs and edge platforms, enabling real-time applications.

Abstract

Transformer-based architectures have demonstrated remarkable success across various domains, but their deployment on edge devices remains challenging due to high memory and computational demands. In this paper, we introduce a novel Reuse Attention mechanism, tailored for efficient memory access and computational optimization, enabling seamless operation on resource-constrained platforms without compromising performance. Unlike traditional multi-head attention (MHA), which redundantly computes separate attention matrices for each head, Reuse Attention consolidates these computations into a shared attention matrix, significantly reducing memory overhead and computational complexity. Comprehensive experiments on ImageNet-1K and downstream tasks show that the proposed UniForm models leveraging Reuse Attention achieve state-of-the-art imagenet classification accuracy while outperforming…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCCD and CMOS Imaging Sensors · Advanced Memory and Neural Computing · Infrared Target Detection Methodologies

MethodsAttention Is All You Need · Softmax · Linear Layer · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Multi-Head Attention