EvoLMM: Self-Evolving Large Multimodal Models with Continuous Rewards

Omkar Thawakar; Shravan Venkatraman; Ritesh Thawkar; Abdelrahman Shaker; Hisham Cholakkal; Rao Muhammad Anwer; Salman Khan; Fahad Khan

arXiv:2511.16672·cs.CV·March 16, 2026

EvoLMM: Self-Evolving Large Multimodal Models with Continuous Rewards

Omkar Thawakar, Shravan Venkatraman, Ritesh Thawkar, Abdelrahman Shaker, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan, Fahad Khan

PDF

Open Access 1 Models

TL;DR

EvoLMM introduces a self-evolving, unsupervised framework for large multimodal models that improves reasoning by using cooperative agents generating and solving questions with continuous self-rewards, eliminating reliance on annotated data.

Contribution

The paper presents EvoLMM, a novel unsupervised self-evolving framework with cooperative agents for improving multimodal reasoning without human annotations.

Findings

01

Achieves up to 3% improvement on multimodal math benchmarks.

02

Operates solely on raw images without external rewards.

03

Demonstrates effective self-improvement in reasoning capabilities.

Abstract

Recent advances in large multimodal models (LMMs) have enabled impressive reasoning and perception abilities, yet most existing training pipelines still depend on human-curated data or externally verified reward models, limiting their autonomy and scalability. In this work, we strive to improve LMM reasoning capabilities in a purely unsupervised fashion (without any annotated data or reward distillation). To this end, we propose a self-evolving framework, named EvoLMM, that instantiates two cooperative agents from a single backbone model: a Proposer, which generates diverse, image-grounded questions, and a Solver, which solves them through internal consistency, where learning proceeds through a continuous self-rewarding process. This dynamic feedback encourages both the generation of informative queries and the refinement of structured reasoning without relying on ground-truth or human…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
omkarthawakar/EvoLMM
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Advanced Graph Neural Networks