Looping Back to Move Forward: Recursive Transformers for Efficient and Flexible Large Multimodal Models

Ruihan Xu; Yuting Gao; Lan Wang; Jianing Li; Weihao Chen; Qingpei Guo; Ming Yang; Shiliang Zhang

arXiv:2602.09080·cs.LG·February 11, 2026

Looping Back to Move Forward: Recursive Transformers for Efficient and Flexible Large Multimodal Models

Ruihan Xu, Yuting Gao, Lan Wang, Jianing Li, Weihao Chen, Qingpei Guo, Ming Yang, Shiliang Zhang

PDF

Open Access

TL;DR

This paper introduces RecursiveVLM, a recursive Transformer architecture for large multimodal models that reuses parameters through recursive refinement, improving efficiency and performance without increasing model size.

Contribution

It proposes a novel recursive Transformer design with a Recursive Connector and Monotonic Recursion Loss to enhance multimodal representations and enable resource-adaptive inference.

Findings

01

+3% performance over standard Transformers

02

+7% improvement over vanilla recursive baselines

03

Effective resource-constrained deployment and progressive output refinement

Abstract

Large Multimodal Models (LMMs) have achieved remarkable success in vision-language tasks, yet their vast parameter counts are often underutilized during both training and inference. In this work, we embrace the idea of looping back to move forward: reusing model parameters through recursive refinement to extract stronger multimodal representations without increasing model size. We propose RecursiveVLM, a recursive Transformer architecture tailored for LMMs. Two key innovations enable effective looping: (i) a Recursive Connector that aligns features across recursion steps by fusing intermediate-layer hidden states and applying modality-specific projections, respecting the distinct statistical structures of vision and language tokens; (ii) a Monotonic Recursion Loss that supervises every step and guarantees performance improves monotonically with recursion depth. This design transforms…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling