C2-Evo: Co-Evolving Multimodal Data and Model for Self-Improving Reasoning

Xiuwei Chen; Wentao Hu; Hanhui Li; Jun Zhou; Zisheng Chen; Meng Cao; Yihan Zeng; Kui Zhang; Yu-Jie Yuan; Jianhua Han; Hang Xu; Xiaodan Liang

arXiv:2507.16518·cs.CV·July 30, 2025

C2-Evo: Co-Evolving Multimodal Data and Model for Self-Improving Reasoning

Xiuwei Chen, Wentao Hu, Hanhui Li, Jun Zhou, Zisheng Chen, Meng Cao, Yihan Zeng, Kui Zhang, Yu-Jie Yuan, Jianhua Han, Hang Xu, Xiaodan Liang

PDF

Open Access

TL;DR

C2-Evo introduces a self-improving framework that jointly evolves multimodal data and models, enhancing reasoning capabilities by iteratively generating complex problems and adapting the model through a closed-loop process.

Contribution

It presents a novel closed-loop system that co-evolves data and models for multimodal reasoning, addressing data complexity and mismatch issues in self-improving models.

Findings

01

Significant performance improvements on multiple mathematical reasoning benchmarks.

02

Effective generation of complex, structured multimodal problems.

03

Demonstrated continuous refinement of models and data through the framework.

Abstract

Recent advances in multimodal large language models (MLLMs) have shown impressive reasoning capabilities. However, further enhancing existing MLLMs necessitates high-quality vision-language datasets with carefully curated task complexities, which are both costly and challenging to scale. Although recent self-improving models that iteratively refine themselves offer a feasible solution, they still suffer from two core challenges: (i) most existing methods augment visual or textual data separately, resulting in discrepancies in data complexity (e.g., over-simplified diagrams paired with redundant textual descriptions); and (ii) the evolution of data and models is also separated, leading to scenarios where models are exposed to tasks with mismatched difficulty levels. To address these issues, we propose C2-Evo, an automatic, closed-loop self-improving framework that jointly evolves both…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies · Speech and dialogue systems · Natural Language Processing Techniques