Syn-GRPO: Self-Evolving Data Synthesis for MLLM Perception Reasoning

Qihan Huang; Haofei Zhang; Rong Wei; Yi Wang; Rui Tang; Mingli Song; Jie Song

arXiv:2511.19343·cs.CV·November 25, 2025

Syn-GRPO: Self-Evolving Data Synthesis for MLLM Perception Reasoning

Qihan Huang, Haofei Zhang, Rong Wei, Yi Wang, Rui Tang, Mingli Song, Jie Song

PDF

Open Access

TL;DR

Syn-GRPO introduces an online data synthesis approach for MLLM perception tasks, significantly enhancing data quality and model performance through diverse, high-quality training samples generated via an image synthesis model and diversity rewards.

Contribution

This work presents Syn-GRPO, a novel self-evolving data synthesis framework that improves reinforcement learning for MLLMs by generating diverse training data with an efficient, decoupled architecture.

Findings

01

Achieves significant performance improvements over existing methods.

02

Enhances data diversity and quality in MLLM perception tasks.

03

Demonstrates scalability for long-term self-evolving reinforcement learning.

Abstract

RL (reinforcement learning) methods (e.g., GRPO) for MLLM (Multimodal LLM) perception ability has attracted wide research interest owing to its remarkable generalization ability. Nevertheless, existing reinforcement learning methods still face the problem of low data quality, where data samples cannot elicit diverse responses from MLLMs, thus restricting the exploration scope for MLLM reinforcement learning. Some methods attempt to mitigate this problem by imposing constraints on entropy, but none address it at its root. Therefore, to tackle this problem, this work proposes Syn-GRPO (Synthesis-GRPO), which employs an online data generator to synthesize high-quality training data with diverse responses in GRPO training. Specifically, Syn-GRPO consists of two components: (1) data server; (2) GRPO workflow. The data server synthesizes new samples from existing ones using an image…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Multimodal Machine Learning Applications · Advanced Neural Network Applications