Vision Tiny Recursion Model (ViTRM): Parameter-Efficient Image Classification via Recursive State Refinement

Ange-Cl\'ement Akazan; Abdoulaye Koroko; Verlon Roel Mbingui; Choukouriyah Arinloye; Hassan Fifen; Rose Bandolo

arXiv:2603.19503·cs.CV·April 2, 2026

Vision Tiny Recursion Model (ViTRM): Parameter-Efficient Image Classification via Recursive State Refinement

Ange-Cl\'ement Akazan, Abdoulaye Koroko, Verlon Roel Mbingui, Choukouriyah Arinloye, Hassan Fifen, Rose Bandolo

PDF

TL;DR

ViTRM introduces a recursive, parameter-efficient vision model that achieves competitive image classification performance with significantly fewer parameters than traditional CNNs and ViTs.

Contribution

The paper proposes ViTRM, a recursive vision architecture that replaces deep encoders with a tiny recursive block, reducing parameters while maintaining performance.

Findings

01

ViTRM uses up to 6x fewer parameters than CNNs.

02

ViTRM uses up to 84x fewer parameters than ViT.

03

ViTRM performs competitively on CIFAR-10 and CIFAR-100.

Abstract

The success of deep learning in computer vision has been driven by models of increasing scale, from deep Convolutional Neural Networks (CNN) to large Vision Transformers (ViT). While effective, these architectures are parameter-intensive and demand significant computational resources, limiting deployment in resource-constrained environments. Inspired by Tiny Recursive Models (TRM), which show that small recursive networks can solve complex reasoning tasks through iterative state refinement, we introduce the \textbf{Vision Tiny Recursion Model (ViTRM)}: a parameter-efficient architecture that replaces the $L$ -layer ViT encoder with a single tiny $k$ -layer block ( $k = 3$ ) applied recursively $N$ times. Despite using up to $6 \times$ and $84 \times$ fewer parameters than CNN based models and ViT respectively, ViTRM maintains competitive performance on CIFAR-10 and CIFAR-100. This…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.