Syn-GRPO: Self-Evolving Data Synthesis for MLLM Perception Reasoning
Qihan Huang, Haofei Zhang, Rong Wei, Yi Wang, Rui Tang, Mingli Song, Jie Song

TL;DR
Syn-GRPO introduces an online data synthesis approach for MLLM perception tasks, significantly enhancing data quality and model performance through diverse, high-quality training samples generated via an image synthesis model and diversity rewards.
Contribution
This work presents Syn-GRPO, a novel self-evolving data synthesis framework that improves reinforcement learning for MLLMs by generating diverse training data with an efficient, decoupled architecture.
Findings
Achieves significant performance improvements over existing methods.
Enhances data diversity and quality in MLLM perception tasks.
Demonstrates scalability for long-term self-evolving reinforcement learning.
Abstract
RL (reinforcement learning) methods (e.g., GRPO) for MLLM (Multimodal LLM) perception ability has attracted wide research interest owing to its remarkable generalization ability. Nevertheless, existing reinforcement learning methods still face the problem of low data quality, where data samples cannot elicit diverse responses from MLLMs, thus restricting the exploration scope for MLLM reinforcement learning. Some methods attempt to mitigate this problem by imposing constraints on entropy, but none address it at its root. Therefore, to tackle this problem, this work proposes Syn-GRPO (Synthesis-GRPO), which employs an online data generator to synthesize high-quality training data with diverse responses in GRPO training. Specifically, Syn-GRPO consists of two components: (1) data server; (2) GRPO workflow. The data server synthesizes new samples from existing ones using an image…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Multimodal Machine Learning Applications · Advanced Neural Network Applications
