RISE: Reliable Improvement in Self-Evolving Vision-Language Models

Chaoran Xu; Yingmao Miao; Pengfei Zhang; Hao Dou; Lei Sun; Xiangxiang Chu

arXiv:2605.20914·cs.CV·May 21, 2026

RISE: Reliable Improvement in Self-Evolving Vision-Language Models

Chaoran Xu, Yingmao Miao, Pengfei Zhang, Hao Dou, Lei Sun, Xiangxiang Chu

PDF

1 Repo

TL;DR

RISE introduces a reliable self-evolving framework for vision-language models that enhances their reasoning abilities by addressing key challenges in autonomous question generation and skill maintenance.

Contribution

The paper proposes a novel self-evolving approach with fine-grained role alternation, quality supervision, and dynamic balancing to improve VLMs without extensive human supervision.

Findings

01

Consistent performance improvements across seven benchmarks.

02

Enhanced question validity and pseudo-label reliability.

03

Broader and sustained skill coverage during evolution.

Abstract

Vision-language models (VLMs) have achieved strong multimodal reasoning capabilities, but further improving them still relies heavily on large-scale human-constructed supervision for post-training. Such supervision is costly to obtain, especially for reasoning-intensive multimodal tasks where questions, answers, and feedback signals must be carefully designed. This motivates self-evolving learning, where a model improves itself through a dual-role closed loop: a questioner autonomously poses questions and a solver learns to solve them. However, we observe that current VLM self-evolving methods still face three major challenges: coarse-grained role alternation delays the interaction between question generation and solver adaptation; generated questions can progressively degrade in quality; and question types may collapse toward a narrow distribution. These issues limit the efficiency and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

AMAP-ML/RISE
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.