SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward

Kaixuan Fan; Kaituo Feng; Haoming Lyu; Dongzhan Zhou; Xiangyu Yue

arXiv:2505.17018·cs.CV·March 18, 2026

SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward

Kaixuan Fan, Kaituo Feng, Haoming Lyu, Dongzhan Zhou, Xiangyu Yue

PDF

Open Access 2 Repos 3 Models 2 Datasets

TL;DR

SophiaVL-R1 enhances multimodal large language models by incorporating a thinking reward mechanism with trust-based weighting and annealing strategies, leading to improved reasoning and generalization on multiple benchmarks.

Contribution

It introduces a novel thinking reward model with trust weighting and annealing to improve reasoning in multimodal LLMs, outperforming larger models.

Findings

01

Outperforms existing reasoning MLLMs on benchmarks like MathVisita and MMMU.

02

SophiaVL-R1-7B surpasses larger models like LLaVA-OneVision-72B.

03

Effective mitigation of reward hacking through Trust-GRPO.

Abstract

Recent advances have shown success in eliciting strong reasoning abilities in multimodal large language models (MLLMs) through rule-based reinforcement learning (RL) with outcome rewards. However, this paradigm typically lacks supervision over the thinking process leading to the final outcome. As a result, the model may learn sub-optimal reasoning strategies, which can hinder its generalization ability. In light of this, we propose SophiaVL-R1, as an attempt to add reward signals for the thinking process in this paradigm. To achieve this, we first train a thinking reward model that evaluates the quality of the entire thinking process. Given that the thinking reward may be unreliable for certain samples due to reward hacking, we propose the Trust-GRPO method, which assigns a trustworthiness weight to the thinking reward during training. This weight is computed based on the thinking…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Machine Learning in Healthcare