Reversible Vision Transformers

Karttikeya Mangalam; Haoqi Fan; Yanghao Li; Chao-Yuan Wu; Bo Xiong,; Christoph Feichtenhofer; Jitendra Malik

arXiv:2302.04869·cs.CV·February 10, 2023

Reversible Vision Transformers

Karttikeya Mangalam, Haoqi Fan, Yanghao Li, Chao-Yuan Wu, Bo Xiong,, Christoph Feichtenhofer, Jitendra Malik

PDF

Open Access 4 Repos

TL;DR

Reversible Vision Transformers significantly reduce memory usage in visual recognition tasks by enabling reversible architectures, allowing for larger models and faster training throughput without sacrificing accuracy.

Contribution

This work introduces reversible variants of Vision Transformers, enabling scalable, memory-efficient models suitable for resource-limited training environments.

Findings

01

Memory footprint reduced by up to 15.5x

02

Throughput increased by up to 2.3x for deeper models

03

Achieved comparable accuracy across multiple tasks

Abstract

We present Reversible Vision Transformers, a memory efficient architecture design for visual recognition. By decoupling the GPU memory requirement from the depth of the model, Reversible Vision Transformers enable scaling up architectures with efficient memory usage. We adapt two popular models, namely Vision Transformer and Multiscale Vision Transformers, to reversible variants and benchmark extensively across both model sizes and tasks of image classification, object detection and video classification. Reversible Vision Transformers achieve a reduced memory footprint of up to 15.5x at roughly identical model complexity, parameters and accuracy, demonstrating the promise of reversible vision transformers as an efficient backbone for hardware resource limited training regimes. Finally, we find that the additional computational burden of recomputing activations is more than overcome for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Memory and Neural Computing · Advanced Image and Video Retrieval Techniques

MethodsAttention Is All You Need · Linear Layer · Absolute Position Encodings · Layer Normalization · Multi-Head Attention · Position-Wise Feed-Forward Layer · Vision Transformer · Adam · Label Smoothing · Softmax