Eventful Transformers: Leveraging Temporal Redundancy in Vision   Transformers

Matthew Dutson; Yin Li; Mohit Gupta

arXiv:2308.13494·cs.CV·August 28, 2023

Eventful Transformers: Leveraging Temporal Redundancy in Vision Transformers

Matthew Dutson, Yin Li, Mohit Gupta

PDF

Open Access 1 Repo 1 Video

TL;DR

Eventful Transformers leverage temporal redundancy in video data to reduce computational costs by re-processing only significantly changed tokens, achieving 2-4x efficiency gains with minimal accuracy loss.

Contribution

The paper introduces a method to identify and re-process only changed tokens in Transformers, enabling adaptive computation control for video tasks without extensive retraining.

Findings

01

Achieves 2-4x computational savings on video datasets.

02

Maintains high accuracy with minor reductions.

03

Applicable to existing Transformers without retraining.

Abstract

Vision Transformers achieve impressive accuracy across a range of visual recognition tasks. Unfortunately, their accuracy frequently comes with high computational costs. This is a particular issue in video recognition, where models are often applied repeatedly across frames or temporal chunks. In this work, we exploit temporal redundancy between subsequent inputs to reduce the cost of Transformers for video processing. We describe a method for identifying and re-processing only those tokens that have changed significantly over time. Our proposed family of models, Eventful Transformers, can be converted from existing Transformers (often without any re-training) and give adaptive control over the compute cost at runtime. We evaluate our method on large-scale datasets for video object detection (ImageNet VID) and action recognition (EPIC-Kitchens 100). Our approach leads to significant…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

WISION-Lab/eventful-transformer
pytorchOfficial

Videos

Eventful Transformers: Leveraging Temporal Redundancy in Vision Transformers· youtube

Taxonomy

TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Human Pose and Action Recognition