Stabilizing Unsupervised Self-Evolution of MLLMs via Continuous Softened Retracing reSampling

Yunyao Yu; Zhengxian Wu; Zhuohong Chen; Hangrui Xu; Zirui Liao; Xiangwen Deng; Zhifang Liu; Senyuan Shi; Haoqian Wang

arXiv:2604.03647·cs.CV·April 9, 2026

Stabilizing Unsupervised Self-Evolution of MLLMs via Continuous Softened Retracing reSampling

Yunyao Yu, Zhengxian Wu, Zhuohong Chen, Hangrui Xu, Zirui Liao, Xiangwen Deng, Zhifang Liu, Senyuan Shi, Haoqian Wang

PDF

1 Repo

TL;DR

This paper introduces CSRS, a novel method for improving the stability and reasoning accuracy of unsupervised multimodal large language models through continuous reward calibration and retracing mechanisms.

Contribution

The paper proposes CSRS, combining retracing re-inference, continuous reward signals, and visual perturbation to enhance reasoning in MLLMs during self-evolution.

Findings

01

CSRS significantly improves reasoning performance on benchmarks like MathVision.

02

Achieves state-of-the-art results in unsupervised self-evolution on geometric tasks.

03

Code is publicly available at the provided GitHub URL.

Abstract

In the unsupervised self-evolution of Multimodal Large Language Models, the quality of feedback signals during post-training is pivotal for stable and effective learning. However, existing self-evolution methods predominantly rely on majority voting to select the most frequent output as the pseudo-golden answer, which may stem from the model's intrinsic biases rather than guaranteeing the objective correctness of the reasoning paths. To counteract the degradation, we propose Continuous Softened Retracing reSampling (CSRS) in MLLM self-evolution. Specifically, we introduce a Retracing Re-inference Mechanism (RRM) that the model re-inferences from anchor points to expand the exploration of long-tail reasoning paths. Simultaneously, we propose Softened Frequency Reward (SFR), which replaces binary rewards with continuous signals, calibrating reward based on the answers' frequency across…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yyy195/CSRS
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.