RaViTT: Random Vision Transformer Tokens
Felipe A. Quezada, Carlos F. Navarro, Cristian Mu\~noz, Manuel, Zamorano, Jorge Jara-Wilde, Violeta Chang, Crist\'obal A. Navarro, Mauricio, Cerda

TL;DR
RaViTT introduces a novel random patch sampling method for Vision Transformers, improving accuracy and reducing computational load, especially beneficial in data-scarce scenarios like biomedical imaging.
Contribution
The paper proposes RaViTT, a new random patch sampling strategy for ViTs that enhances performance and efficiency across multiple datasets.
Findings
RaViTT outperforms baseline ViTs in all tested datasets.
RaViTT surpasses state-of-the-art augmentation techniques in 3 out of 4 datasets.
RaViTT achieves accuracy gains with fewer tokens, reducing computational costs.
Abstract
Vision Transformers (ViTs) have successfully been applied to image classification problems where large annotated datasets are available. On the other hand, when fewer annotations are available, such as in biomedical applications, image augmentation techniques like introducing image variations or combinations have been proposed. However, regarding ViT patch sampling, less has been explored outside grid-based strategies. In this work, we propose Random Vision Transformer Tokens (RaViTT), a random patch sampling strategy that can be incorporated into existing ViTs. We experimentally evaluated RaViTT for image classification, comparing it with a baseline ViT and state-of-the-art (SOTA) augmentation techniques in 4 datasets, including ImageNet-1k and CIFAR-100. Results show that RaViTT increases the accuracy of the baseline in all datasets and outperforms the SOTA augmentation techniques in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Machine Learning and ELM
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Label Smoothing · Layer Normalization · Adam · Absolute Position Encodings · Softmax · Residual Connection
