RaViTT: Random Vision Transformer Tokens

Felipe A. Quezada; Carlos F. Navarro; Cristian Mu\~noz; Manuel; Zamorano; Jorge Jara-Wilde; Violeta Chang; Crist\'obal A. Navarro; Mauricio; Cerda

arXiv:2306.10959·cs.CV·June 21, 2023·2 cites

RaViTT: Random Vision Transformer Tokens

Felipe A. Quezada, Carlos F. Navarro, Cristian Mu\~noz, Manuel, Zamorano, Jorge Jara-Wilde, Violeta Chang, Crist\'obal A. Navarro, Mauricio, Cerda

PDF

Open Access

TL;DR

RaViTT introduces a novel random patch sampling method for Vision Transformers, improving accuracy and reducing computational load, especially beneficial in data-scarce scenarios like biomedical imaging.

Contribution

The paper proposes RaViTT, a new random patch sampling strategy for ViTs that enhances performance and efficiency across multiple datasets.

Findings

01

RaViTT outperforms baseline ViTs in all tested datasets.

02

RaViTT surpasses state-of-the-art augmentation techniques in 3 out of 4 datasets.

03

RaViTT achieves accuracy gains with fewer tokens, reducing computational costs.

Abstract

Vision Transformers (ViTs) have successfully been applied to image classification problems where large annotated datasets are available. On the other hand, when fewer annotations are available, such as in biomedical applications, image augmentation techniques like introducing image variations or combinations have been proposed. However, regarding ViT patch sampling, less has been explored outside grid-based strategies. In this work, we propose Random Vision Transformer Tokens (RaViTT), a random patch sampling strategy that can be incorporated into existing ViTs. We experimentally evaluated RaViTT for image classification, comparing it with a baseline ViT and state-of-the-art (SOTA) augmentation techniques in 4 datasets, including ImageNet-1k and CIFAR-100. Results show that RaViTT increases the accuracy of the baseline in all datasets and outperforms the SOTA augmentation techniques in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Machine Learning and ELM

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Label Smoothing · Layer Normalization · Adam · Absolute Position Encodings · Softmax · Residual Connection