R1-ShareVL: Incentivizing Reasoning Capability of Multimodal Large Language Models via Share-GRPO

Huanjin Yao; Qixiang Yin; Jingyi Zhang; Min Yang; Yibo Wang; Wenhao Wu; Fei Su; Li Shen; Minghui Qiu; Dacheng Tao; Jiaxing Huang

arXiv:2505.16673·cs.CV·May 23, 2025

R1-ShareVL: Incentivizing Reasoning Capability of Multimodal Large Language Models via Share-GRPO

Huanjin Yao, Qixiang Yin, Jingyi Zhang, Min Yang, Yibo Wang, Wenhao Wu, Fei Su, Li Shen, Minghui Qiu, Dacheng Tao, Jiaxing Huang

PDF

Open Access 2 Repos 1 Models

TL;DR

This paper introduces Share-GRPO, a reinforcement learning method that enhances the reasoning capabilities of Multimodal Large Language Models by exploring and sharing diverse reasoning trajectories across expanded question spaces, leading to improved performance.

Contribution

The paper proposes Share-GRPO, a novel RL approach that mitigates sparse rewards and advantage vanishing by expanding question space and sharing reasoning trajectories and rewards during training.

Findings

01

Outperforms existing methods on six reasoning benchmarks.

02

Effectively explores diverse reasoning trajectories.

03

Improves stability and accuracy of policy training.

Abstract

In this work, we aim to incentivize the reasoning ability of Multimodal Large Language Models (MLLMs) via reinforcement learning (RL) and develop an effective approach that mitigates the sparse reward and advantage vanishing issues during RL. To this end, we propose Share-GRPO, a novel RL approach that tackle these issues by exploring and sharing diverse reasoning trajectories over expanded question space. Specifically, Share-GRPO first expands the question space for a given question via data transformation techniques, and then encourages MLLM to effectively explore diverse reasoning trajectories over the expanded question space and shares the discovered reasoning trajectories across the expanded questions during RL. In addition, Share-GRPO also shares reward information during advantage computation, which estimates solution advantages hierarchically across and within question variants,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
HuanjinYao/R1-ShareVL-7B
model· 11 dl· ♡ 1
11 dl♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning