Spectral Vision Transformer for Efficient Tokenization with Limited Data

Alexandra G. Roberts; Maneesh John; Jinwei Zhang; Dominick Romano; Mert Sisman; Ki Sueng Choi; Heejong Kim; Mert R. Sabuncu; Thanh D. Nguyen; Alexey V. Dimov; Pascal Spincemaille; Brian H. Kopell; Yi Wang

arXiv:2605.12026·cs.CV·May 13, 2026

Spectral Vision Transformer for Efficient Tokenization with Limited Data

Alexandra G. Roberts, Maneesh John, Jinwei Zhang, Dominick Romano, Mert Sisman, Ki Sueng Choi, Heejong Kim, Mert R. Sabuncu, Thanh D. Nguyen, Alexey V. Dimov, Pascal Spincemaille, Brian H. Kopell, Yi Wang

PDF

1 Repo

TL;DR

This paper introduces a spectral vision transformer architecture optimized for limited data scenarios, especially in medical imaging, demonstrating reduced complexity and competitive performance.

Contribution

The spectral vision transformer offers a novel basis choice with theoretical advantages and improved efficiency over existing models, with code publicly available.

Findings

01

Achieves comparable or superior accuracy with fewer parameters.

02

Reduces model complexity through spectral projection.

03

Performs well across simulated, public, and clinical datasets.

Abstract

We propose a novel spectral vision transformer architecture for efficient tokenization in limited data, with an emphasis on medical imaging. We outline convenient theoretical properties arising from the choice of basis including spatial invariance and optimal signal-to-noise ratio. We show reduced complexity arising from the spectral projection compared to spatial vision transformers. We show equitable or superior performance with a reduced number of parameters as compared to a variety of models including compact and standard vision transformers, convolutional neural networks with attention, shifted window transformers, multi-layer perceptrons, and logistic regression. We include simulated, public, and clinical data in our analysis and release our code at: \verb+github.com/agr78/spectralViT+.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

agr78/spectralViT+
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.