Do We Need Reformer for Vision? An Experimental Comparison with Vision Transformers

Ali El Bellaj; Mohammed-Amine Cheddadi; Rhassan Berber

arXiv:2512.11260·cs.CV·January 8, 2026

Do We Need Reformer for Vision? An Experimental Comparison with Vision Transformers

Ali El Bellaj, Mohammed-Amine Cheddadi, Rhassan Berber

PDF

Open Access

TL;DR

This paper compares Reformer and Vision Transformers for computer vision tasks, finding that while Reformers are theoretically more efficient, Vision Transformers outperform them in practical accuracy and efficiency on larger, high-resolution datasets.

Contribution

It provides an experimental comparison of Reformer and Vision Transformers in vision tasks, highlighting the practical limitations of LSH attention for typical image sizes.

Findings

01

Reformer achieves higher accuracy on CIFAR-10.

02

Vision Transformers outperform Reformer in efficiency on larger datasets.

03

LSH attention's theoretical benefits are limited for standard image resolutions.

Abstract

Transformers have recently demonstrated strong performance in computer vision, with Vision Transformers (ViTs) leveraging self-attention to capture both low-level and high-level image features. However, standard ViTs remain computationally expensive, since global self-attention scales quadratically with the number of tokens, which limits their practicality for high-resolution inputs and resource-constrained settings. In this work, we investigate the Reformer architecture as an alternative vision backbone. By combining patch-based tokenization with locality-sensitive hashing (LSH) attention, our model approximates global self-attention while reducing its theoretical time complexity from $O (n^{2})$ to $O (n lo g n)$ in the sequence length $n$ . We evaluate the proposed Reformer-based vision model on CIFAR-10 to assess its behavior on small-scale datasets, on ImageNet-100…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Advanced Neural Network Applications · Retinal Imaging and Analysis