VariViT: A Vision Transformer for Variable Image Sizes
Aswathi Varma, Suprosanna Shit, Chinmay Prabhakar, Daniel Scholz, Hongwei Bran Li, Bjoern Menze, Daniel Rueckert, Benedikt Wiestler

TL;DR
VariViT is a novel Vision Transformer designed to handle variable-sized images, especially in medical imaging, improving accuracy and computational efficiency over traditional fixed-size ViTs.
Contribution
It introduces a new positional embedding resizing scheme and batching strategy enabling ViT to process variable image sizes efficiently.
Findings
Outperforms vanilla ViTs and ResNet in brain tumor classification and glioma genotype prediction.
Achieves F1-scores of 75.5% and 76.3% on two 3D brain MRI datasets.
Reduces computation time by up to 30% with the new batching strategy.
Abstract
Vision Transformers (ViTs) have emerged as the state-of-the-art architecture in representation learning, leveraging self-attention mechanisms to excel in various tasks. ViTs split images into fixed-size patches, constraining them to a predefined size and necessitating pre-processing steps like resizing, padding, or cropping. This poses challenges in medical imaging, particularly with irregularly shaped structures like tumors. A fixed bounding box crop size produces input images with highly variable foreground-to-background ratios. Resizing medical images can degrade information and introduce artefacts, impacting diagnosis. Hence, tailoring variable-sized crops to regions of interest can enhance feature representation capabilities. Moreover, large images are computationally expensive, and smaller sizes risk information loss, presenting a computation-accuracy tradeoff. We propose VariViT,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Brain Tumor Detection and Classification · Generative Adversarial Networks and Image Synthesis
