ViR:the Vision Reservoir
Xian Wei, Bin Wang, Mingsong Chen, Ji Yuan, Hai Lan, Jiehuang Shi,, Xuan Tang, Bo Jin, Guozhang Chen, Dongping Yang

TL;DR
The paper introduces Vision Reservoir (ViR), a novel image classification method that replaces Transformer modules with a reservoir computing approach, reducing complexity and improving performance without pre-training.
Contribution
Proposes ViR, a reservoir computing-based alternative to ViT, addressing high computation and overfitting issues in image classification.
Findings
ViR outperforms ViT in accuracy without pre-training.
ViR has significantly fewer parameters and lower memory usage.
ViR demonstrates superior performance on multiple benchmarks.
Abstract
The most recent year has witnessed the success of applying the Vision Transformer (ViT) for image classification. However, there are still evidences indicating that ViT often suffers following two aspects, i) the high computation and the memory burden from applying the multiple Transformer layers for pre-training on a large-scale dataset, ii) the over-fitting when training on small datasets from scratch. To address these problems, a novel method, namely, Vision Reservoir computing (ViR), is proposed here for image classification, as a parallel to ViT. By splitting each image into a sequence of tokens with fixed length, the ViR constructs a pure reservoir with a nearly fully connected topology to replace the Transformer module in ViT. Two kinds of deep ViR models are subsequently proposed to enhance the network performance. Comparative experiments between the ViR and the ViT are carried…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Reservoir Computing · Advanced Memory and Neural Computing · Ferroelectric and Negative Capacitance Devices
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dropout · Label Smoothing · Byte Pair Encoding · Softmax · Dense Connections · Position-Wise Feed-Forward Layer · Adam
