Faster Attention Is What You Need: A Fast Self-Attention Neural Network   Backbone Architecture for the Edge via Double-Condensing Attention Condensers

Alexander Wong; Mohammad Javad Shafiee; Saad Abbasi; Saeejith Nair,; and Mahmoud Famouri

arXiv:2208.06980·cs.CV·February 6, 2023·5 cites

Faster Attention Is What You Need: A Fast Self-Attention Neural Network Backbone Architecture for the Edge via Double-Condensing Attention Condensers

Alexander Wong, Mohammad Javad Shafiee, Saad Abbasi, Saeejith Nair,, and Mahmoud Famouri

PDF

Open Access

TL;DR

This paper introduces AttendNeXt, a highly efficient self-attention neural network backbone optimized for edge devices, achieving significant speed and size improvements while maintaining high accuracy for TinyML applications.

Contribution

The paper proposes a novel double-condensing attention condenser design and a machine-driven architecture exploration strategy to create a faster, smaller, and more accurate self-attention backbone for edge devices.

Findings

01

AttendNeXt is over 10x faster than state-of-the-art backbones.

02

It is 1.37x smaller than MobileNetv3-L with higher accuracy.

03

Achieves 1.1% higher top-1 accuracy than MobileViT XS on ImageNet.

Abstract

With the growing adoption of deep learning for on-device TinyML applications, there has been an ever-increasing demand for efficient neural network backbones optimized for the edge. Recently, the introduction of attention condenser networks have resulted in low-footprint, highly-efficient, self-attention neural networks that strike a strong balance between accuracy and speed. In this study, we introduce a faster attention condenser design called double-condensing attention condensers that allow for highly condensed feature embeddings. We further employ a machine-driven design exploration strategy that imposes design constraints based on best practices for greater efficiency and robustness to produce the macro-micro architecture constructs of the backbone. The resulting backbone (which we name AttendNeXt) achieves significantly higher inference throughput on an embedded ARM processor…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Machine Learning and ELM · Brain Tumor Detection and Classification

MethodsMobileViT · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings