Vision Tiny Recursion Model (ViTRM): Parameter-Efficient Image Classification via Recursive State Refinement
Ange-Cl\'ement Akazan, Abdoulaye Koroko, Verlon Roel Mbingui, Choukouriyah Arinloye, Hassan Fifen, Rose Bandolo

TL;DR
ViTRM introduces a recursive, parameter-efficient vision model that achieves competitive image classification performance with significantly fewer parameters than traditional CNNs and ViTs.
Contribution
The paper proposes ViTRM, a recursive vision architecture that replaces deep encoders with a tiny recursive block, reducing parameters while maintaining performance.
Findings
ViTRM uses up to 6x fewer parameters than CNNs.
ViTRM uses up to 84x fewer parameters than ViT.
ViTRM performs competitively on CIFAR-10 and CIFAR-100.
Abstract
The success of deep learning in computer vision has been driven by models of increasing scale, from deep Convolutional Neural Networks (CNN) to large Vision Transformers (ViT). While effective, these architectures are parameter-intensive and demand significant computational resources, limiting deployment in resource-constrained environments. Inspired by Tiny Recursive Models (TRM), which show that small recursive networks can solve complex reasoning tasks through iterative state refinement, we introduce the \textbf{Vision Tiny Recursion Model (ViTRM)}: a parameter-efficient architecture that replaces the -layer ViT encoder with a single tiny -layer block () applied recursively times. Despite using up to and fewer parameters than CNN based models and ViT respectively, ViTRM maintains competitive performance on CIFAR-10 and CIFAR-100. This…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
