EVE: Verifiable Self-Evolution of MLLMs via Executable Visual Transformations

Yongrui Heng; Chaoya Jiang; Han Yang; Shikun Zhang; Wei Ye

arXiv:2604.18320·cs.CV·April 21, 2026

EVE: Verifiable Self-Evolution of MLLMs via Executable Visual Transformations

Yongrui Heng, Chaoya Jiang, Han Yang, Shikun Zhang, Wei Ye

PDF

1 Repo

TL;DR

EVE introduces a novel framework for self-evolving multimodal large language models by using executable visual transformations to generate verifiable training data, avoiding pseudo-labels and enabling continuous, diverse, and challenging model improvement.

Contribution

EVE presents a dual-policy architecture that synthesizes dynamic visual transformations with verified ground-truth answers, advancing self-evolution methods for MLLMs.

Findings

01

EVE outperforms existing self-evolution approaches in experiments.

02

The framework effectively maintains diversity and difficulty in training tasks.

03

EVE's approach ensures verifiable supervision without relying on model predictions.

Abstract

Self-evolution of multimodal large language models (MLLMs) remains a critical challenge: pseudo-label-based methods suffer from progressive quality degradation as model predictions drift, while template-based methods are confined to a static set of transformations that cannot adapt in difficulty or diversity. We contend that robust, continuous self-improvement requires not only deterministic external feedback independent of the model's internal certainty, but also a mechanism to perpetually diversify the training distribution. To this end, we introduce EVE (Executable Visual transformation-based self-Evolution), a novel framework that entirely bypasses pseudo-labels by harnessing executable visual transformations continuously enriched in both variety and complexity. EVE adopts a Challenger-Solver dual-policy architecture. The Challenger maintains and progressively expands a queue of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

0001Henry/EVE
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.