Efficient Attention-free Video Shift Transformers

Adrian Bulat; Brais Martinez; Georgios Tzimiropoulos

arXiv:2208.11108·cs.CV·August 24, 2022

Efficient Attention-free Video Shift Transformers

Adrian Bulat, Brais Martinez, Georgios Tzimiropoulos

PDF

Open Access

TL;DR

This paper introduces VAST, an attention-free video transformer using shift operations, achieving high efficiency and accuracy in video recognition tasks, outperforming existing models with lower computational costs.

Contribution

The paper presents the first attention-free shift-based video transformer, VAST, that approximates transformer operations and outperforms state-of-the-art models in efficiency and accuracy.

Findings

01

VAST outperforms recent transformers on action recognition benchmarks.

02

The Affine-Shift block achieves high accuracy with low computational cost.

03

VAST is the first purely shift-based video transformer.

Abstract

This paper tackles the problem of efficient video recognition. In this area, video transformers have recently dominated the efficiency (top-1 accuracy vs FLOPs) spectrum. At the same time, there have been some attempts in the image domain which challenge the necessity of the self-attention operation within the transformer architecture, advocating the use of simpler approaches for token mixing. However, there are no results yet for the case of video recognition, where the self-attention operator has a significantly higher impact (compared to the case of images) on efficiency. To address this gap, in this paper, we make the following contributions: (a) we construct a highly efficient \& accurate attention-free block based on the shift operator, coined Affine-Shift block, specifically designed to approximate as closely as possible the operations in the MHSA block of a Transformer layer.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Anomaly Detection Techniques and Applications · Brain Tumor Detection and Classification

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Layer Normalization · Byte Pair Encoding · Dense Connections · Residual Connection · Dropout · Softmax · Label Smoothing