SparseSwin: Swin Transformer with Sparse Transformer Block

Krisna Pinasthika; Blessius Sheldo Putra Laksono; Riyandi Banovbi; Putera Irsal; Syifa Hukma Shabiyya; Novanto Yudistira

arXiv:2309.05224·cs.CV·March 8, 2024·1 cites

SparseSwin: Swin Transformer with Sparse Transformer Block

Krisna Pinasthika, Blessius Sheldo Putra Laksono, Riyandi Banovbi, Putera Irsal, Syifa Hukma Shabiyya, Novanto Yudistira

PDF

Open Access 1 Repo

TL;DR

SparseSwin introduces a sparse transformer block within the Swin Transformer architecture, significantly reducing parameters and computational complexity while maintaining high accuracy in image classification tasks.

Contribution

The paper proposes the Sparse Transformer (SparTa) Block with a sparse token converter and integrates it into Swin Transformer, enhancing efficiency and performance.

Findings

01

Achieves 86.96% accuracy on ImageNet100

02

Outperforms state-of-the-art models on CIFAR10 and CIFAR100

03

Uses fewer parameters while maintaining high accuracy

Abstract

Advancements in computer vision research have put transformer architecture as the state of the art in computer vision tasks. One of the known drawbacks of the transformer architecture is the high number of parameters, this can lead to a more complex and inefficient algorithm. This paper aims to reduce the number of parameters and in turn, made the transformer more efficient. We present Sparse Transformer (SparTa) Block, a modified transformer block with an addition of a sparse token converter that reduces the number of tokens used. We use the SparTa Block inside the Swin T architecture (SparseSwin) to leverage Swin capability to downsample its input and reduce the number of initial tokens to be calculated. The proposed SparseSwin model outperforms other state of the art models in image classification with an accuracy of 86.96%, 97.43%, and 85.35% on the ImageNet100, CIFAR10, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

krisnapinasthika/sparseswin
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBrain Tumor Detection and Classification · Advanced Neural Network Applications · Retinal Imaging and Analysis

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Attention Dropout · Residual Connection · Adam · Weight Decay · Cosine Annealing · Byte Pair Encoding