Do We Need Reformer for Vision? An Experimental Comparison with Vision Transformers
Ali El Bellaj, Mohammed-Amine Cheddadi, Rhassan Berber

TL;DR
This paper compares Reformer and Vision Transformers for computer vision tasks, finding that while Reformers are theoretically more efficient, Vision Transformers outperform them in practical accuracy and efficiency on larger, high-resolution datasets.
Contribution
It provides an experimental comparison of Reformer and Vision Transformers in vision tasks, highlighting the practical limitations of LSH attention for typical image sizes.
Findings
Reformer achieves higher accuracy on CIFAR-10.
Vision Transformers outperform Reformer in efficiency on larger datasets.
LSH attention's theoretical benefits are limited for standard image resolutions.
Abstract
Transformers have recently demonstrated strong performance in computer vision, with Vision Transformers (ViTs) leveraging self-attention to capture both low-level and high-level image features. However, standard ViTs remain computationally expensive, since global self-attention scales quadratically with the number of tokens, which limits their practicality for high-resolution inputs and resource-constrained settings. In this work, we investigate the Reformer architecture as an alternative vision backbone. By combining patch-based tokenization with locality-sensitive hashing (LSH) attention, our model approximates global self-attention while reducing its theoretical time complexity from to in the sequence length . We evaluate the proposed Reformer-based vision model on CIFAR-10 to assess its behavior on small-scale datasets, on ImageNet-100…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Advanced Neural Network Applications · Retinal Imaging and Analysis
