TL;DR
iReasoner introduces a novel self-evolving framework that enhances large multimodal models' reasoning capabilities by explicitly training intermediate reasoning steps using trajectory-aware intrinsic rewards, leading to improved performance without labeled data.
Contribution
The paper presents a new self-evolving approach that explicitly trains intermediate reasoning in LMMs using trajectory-aware signals, improving reasoning without external supervision.
Findings
Up to +2.1 points improvement on multimodal reasoning benchmarks.
Effective training of intermediate reasoning steps without ground-truth labels.
Code available at https://meghanaasunil.github.io/iReasoner.
Abstract
Recent work shows that large multimodal models (LMMs) can self-improve from unlabeled data via self-play and intrinsic feedback. Yet existing self-evolving frameworks mainly reward final outcomes, leaving intermediate reasoning weakly constrained despite its importance for visually grounded decision making. We propose iReasoner, a self-evolving framework that improves an LMM's implicit reasoning by explicitly eliciting chain-of-thought (CoT) and rewarding its internal agreement. In a Proposer--Solver loop over unlabeled images, iReasoner augments outcome-level intrinsic rewards with a trajectory-aware signal defined over intermediate reasoning steps, providing learning signals that distinguish reasoning paths leading to the same answer without ground-truth labels or external judges. Starting from Qwen2.5-VL-7B, iReasoner yields up to points across diverse multimodal reasoning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Topic Modeling
