Sparse then Prune: Toward Efficient Vision Transformers
Yogi Prasetyo, Novanto Yudistira, Agus Wahyu Widodo

TL;DR
This paper explores applying Sparse Regularization and Pruning to Vision Transformers to improve their efficiency and accuracy on image classification tasks, demonstrating that these methods enhance performance with reduced computational costs.
Contribution
It introduces a combined approach of Sparse Regularization and Pruning for Vision Transformers, showing improved accuracy and efficiency over traditional pruning methods.
Findings
Sparse Regularization increases accuracy by 0.12%.
Pruning with Sparse Regularization further improves accuracy, e.g., 1.764% on CIFAR-100.
The method enhances Vision Transformer performance on multiple datasets.
Abstract
The Vision Transformer architecture is a deep learning model inspired by the success of the Transformer model in Natural Language Processing. However, the self-attention mechanism, large number of parameters, and the requirement for a substantial amount of training data still make Vision Transformers computationally burdensome. In this research, we investigate the possibility of applying Sparse Regularization to Vision Transformers and the impact of Pruning, either after Sparse Regularization or without it, on the trade-off between performance and efficiency. To accomplish this, we apply Sparse Regularization and Pruning methods to the Vision Transformer architecture for image classification tasks on the CIFAR-10, CIFAR-100, and ImageNet-100 datasets. The training process for the Vision Transformer model consists of two parts: pre-training and fine-tuning. Pre-training utilizes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · COVID-19 diagnosis using AI · Domain Adaptation and Few-Shot Learning
MethodsAttention Is All You Need · Adam · Label Smoothing · Layer Normalization · Absolute Position Encodings · Linear Layer · Softmax · Dense Connections · Multi-Head Attention · Dropout
