Vision Transformer Pruning Via Matrix Decomposition

Tianyi Sun

arXiv:2308.10839·cs.CV·August 22, 2023

Vision Transformer Pruning Via Matrix Decomposition

Tianyi Sun

PDF

Open Access

TL;DR

This paper advances Vision Transformer pruning by integrating matrix decomposition techniques, notably Singular Value Decomposition, to effectively reduce model complexity while maintaining accuracy.

Contribution

It introduces and compares multiple matrix decomposition methods for pruning Vision Transformers, selecting SVD as the most effective approach.

Findings

01

SVD preserves accuracy better than other decompositions.

02

Matrix decomposition reduces model size and computational demands.

03

SVD outperforms QR and LU in maintaining original accuracy.

Abstract

This is a further development of Vision Transformer Pruning via matrix decomposition. The purpose of the Vision Transformer Pruning is to prune the dimension of the linear projection of the dataset by learning their associated importance score in order to reduce the storage, run-time memory, and computational demands. In this paper we further reduce dimension and complexity of the linear projection by implementing and comparing several matrix decomposition methods while preserving the generated important features. We end up selected the Singular Value Decomposition as the method to achieve our goal by comparing the original accuracy scores in the original Github repository and the accuracy scores of using those matrix decomposition methods, including Singular Value Decomposition, four versions of QR Decomposition, and LU factorization.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Retinal Imaging and Analysis · CCD and CMOS Imaging Sensors

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Adam · Label Smoothing · Layer Normalization · Softmax · Dense Connections