When Models Judge Themselves: Unsupervised Self-Evolution for Multimodal Reasoning

Zhengxian Wu; Kai Shi; Chuanrui Zhang; Zirui Liao; Jun Yang; Ni Yang; Qiuying Peng; Luyuan Zhang; Hangrui Xu; Tianhuang Su; Zhenyu Yang; Haonan Lu; Haoqian Wang

arXiv:2603.21289·cs.CV·March 25, 2026

When Models Judge Themselves: Unsupervised Self-Evolution for Multimodal Reasoning

Zhengxian Wu, Kai Shi, Chuanrui Zhang, Zirui Liao, Jun Yang, Ni Yang, Qiuying Peng, Luyuan Zhang, Hangrui Xu, Tianhuang Su, Zhenyu Yang, Haonan Lu, Haoqian Wang

PDF

Open Access

TL;DR

This paper introduces an unsupervised self-evolution training framework for multimodal reasoning models that improves performance without relying on annotated data or external rewards, using a novel self-judgment mechanism.

Contribution

It presents a novel unsupervised training method leveraging self-judgment and trajectory sampling to enhance multimodal reasoning models without human annotations.

Findings

01

Consistent performance improvements on five mathematical reasoning benchmarks.

02

Effective use of self-consistency signals as training priors.

03

Robust policy updates via group-level relative scoring.

Abstract

Recent progress in multimodal large language models has led to strong performance on reasoning tasks, but these improvements largely rely on high-quality annotated data or teacher-model distillation, both of which are costly and difficult to scale. To address this, we propose an unsupervised self-evolution training framework for multimodal reasoning that achieves stable performance improvements without using human-annotated answers or external reward models. For each input, we sample multiple reasoning trajectories and jointly model their within group structure. We use the Actor's self-consistency signal as a training prior, and introduce a bounded Judge based modulation to continuously reweight trajectories of different quality. We further model the modulated scores as a group level distribution and convert absolute scores into relative advantages within each group, enabling more…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Intelligent Tutoring Systems and Adaptive Learning