UniForm: A Reuse Attention Mechanism Optimized for Efficient Vision Transformers on Edge Devices
Seul-Ki Yeom, Tae-Ho Kim

TL;DR
UniForm introduces a novel Reuse Attention mechanism that consolidates attention computations to significantly reduce memory and computational demands, enabling efficient deployment of vision transformers on edge devices without sacrificing accuracy.
Contribution
The paper presents Reuse Attention, a new attention mechanism that improves efficiency and scalability of vision transformers for resource-constrained edge devices.
Findings
Achieves 76.7% Top-1 accuracy on ImageNet-1K with 21.8ms inference on Jetson AGX Orin.
Outperforms existing attention mechanisms like Linear and Flash Attention in speed and memory.
Demonstrates versatility across GPUs and edge platforms, enabling real-time applications.
Abstract
Transformer-based architectures have demonstrated remarkable success across various domains, but their deployment on edge devices remains challenging due to high memory and computational demands. In this paper, we introduce a novel Reuse Attention mechanism, tailored for efficient memory access and computational optimization, enabling seamless operation on resource-constrained platforms without compromising performance. Unlike traditional multi-head attention (MHA), which redundantly computes separate attention matrices for each head, Reuse Attention consolidates these computations into a shared attention matrix, significantly reducing memory overhead and computational complexity. Comprehensive experiments on ImageNet-1K and downstream tasks show that the proposed UniForm models leveraging Reuse Attention achieve state-of-the-art imagenet classification accuracy while outperforming…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCCD and CMOS Imaging Sensors · Advanced Memory and Neural Computing · Infrared Target Detection Methodologies
MethodsAttention Is All You Need · Softmax · Linear Layer · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Multi-Head Attention
