Sparse then Prune: Toward Efficient Vision Transformers

Yogi Prasetyo; Novanto Yudistira; Agus Wahyu Widodo

arXiv:2307.11988·cs.CV·July 25, 2023

Sparse then Prune: Toward Efficient Vision Transformers

Yogi Prasetyo, Novanto Yudistira, Agus Wahyu Widodo

PDF

Open Access 1 Repo

TL;DR

This paper explores applying Sparse Regularization and Pruning to Vision Transformers to improve their efficiency and accuracy on image classification tasks, demonstrating that these methods enhance performance with reduced computational costs.

Contribution

It introduces a combined approach of Sparse Regularization and Pruning for Vision Transformers, showing improved accuracy and efficiency over traditional pruning methods.

Findings

01

Sparse Regularization increases accuracy by 0.12%.

02

Pruning with Sparse Regularization further improves accuracy, e.g., 1.764% on CIFAR-100.

03

The method enhances Vision Transformer performance on multiple datasets.

Abstract

The Vision Transformer architecture is a deep learning model inspired by the success of the Transformer model in Natural Language Processing. However, the self-attention mechanism, large number of parameters, and the requirement for a substantial amount of training data still make Vision Transformers computationally burdensome. In this research, we investigate the possibility of applying Sparse Regularization to Vision Transformers and the impact of Pruning, either after Sparse Regularization or without it, on the trade-off between performance and efficiency. To accomplish this, we apply Sparse Regularization and Pruning methods to the Vision Transformer architecture for image classification tasks on the CIFAR-10, CIFAR-100, and ImageNet-100 datasets. The training process for the Vision Transformer model consists of two parts: pre-training and fine-tuning. Pre-training utilizes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yogiprsty/sparse-vit
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · COVID-19 diagnosis using AI · Domain Adaptation and Few-Shot Learning

MethodsAttention Is All You Need · Adam · Label Smoothing · Layer Normalization · Absolute Position Encodings · Linear Layer · Softmax · Dense Connections · Multi-Head Attention · Dropout