iReasoner: Trajectory-Aware Intrinsic Reasoning Supervision for Self-Evolving Large Multimodal Models

Meghana Sunil; Manikandarajan Venmathimaran; Muthu Subash Kavitha

arXiv:2601.05877·cs.CL·May 21, 2026

iReasoner: Trajectory-Aware Intrinsic Reasoning Supervision for Self-Evolving Large Multimodal Models

Meghana Sunil, Manikandarajan Venmathimaran, Muthu Subash Kavitha

PDF

2 Repos

TL;DR

iReasoner introduces a novel self-evolving framework that enhances large multimodal models' reasoning capabilities by explicitly training intermediate reasoning steps using trajectory-aware intrinsic rewards, leading to improved performance without labeled data.

Contribution

The paper presents a new self-evolving approach that explicitly trains intermediate reasoning in LMMs using trajectory-aware signals, improving reasoning without external supervision.

Findings

01

Up to +2.1 points improvement on multimodal reasoning benchmarks.

02

Effective training of intermediate reasoning steps without ground-truth labels.

03

Code available at https://meghanaasunil.github.io/iReasoner.

Abstract

Recent work shows that large multimodal models (LMMs) can self-improve from unlabeled data via self-play and intrinsic feedback. Yet existing self-evolving frameworks mainly reward final outcomes, leaving intermediate reasoning weakly constrained despite its importance for visually grounded decision making. We propose iReasoner, a self-evolving framework that improves an LMM's implicit reasoning by explicitly eliciting chain-of-thought (CoT) and rewarding its internal agreement. In a Proposer--Solver loop over unlabeled images, iReasoner augments outcome-level intrinsic rewards with a trajectory-aware signal defined over intermediate reasoning steps, providing learning signals that distinguish reasoning paths leading to the same answer without ground-truth labels or external judges. Starting from Qwen2.5-VL-7B, iReasoner yields up to $+ 2.1$ points across diverse multimodal reasoning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Topic Modeling