Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start

Lai Wei; Yuting Li; Kaipeng Zheng; Chen Wang; Yue Wang; Linghe Kong; Lichao Sun; Weiran Huang

arXiv:2505.22334·cs.CL·July 24, 2025

Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start

Lai Wei, Yuting Li, Kaipeng Zheng, Chen Wang, Yue Wang, Linghe Kong, Lichao Sun, Weiran Huang

PDF

Open Access 2 Repos 2 Models 2 Datasets

TL;DR

This paper introduces a two-stage training approach combining supervised fine-tuning and reinforcement learning to significantly improve multimodal reasoning in large language models, achieving state-of-the-art results.

Contribution

The study demonstrates that a cold start with structured reasoning patterns followed by RL refinement enhances multimodal reasoning performance.

Findings

01

Outperforms SFT-only and RL-only methods on benchmarks

02

Achieves state-of-the-art results for open-source MLLMs at 3B and 7B scales

03

7B model shows substantial improvements over base models

Abstract

Recent advancements in large language models (LLMs) have demonstrated impressive chain-of-thought reasoning capabilities, with reinforcement learning (RL) playing a crucial role in this progress. While "aha moment" patterns--where models exhibit self-correction through reflection--are often attributed to emergent properties from RL, we first demonstrate that these patterns exist in multimodal LLMs (MLLMs) prior to RL training but may not necessarily correlate with improved reasoning performance. Building on these insights, we present a comprehensive study on enhancing multimodal reasoning through a two-stage approach: (1) supervised fine-tuning (SFT) as a cold start with structured chain-of-thought reasoning patterns, followed by (2) reinforcement learning via GRPO to further refine these capabilities. Our extensive experiments show that this combined approach consistently outperforms…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Reinforcement Learning in Robotics

MethodsBalanced Selection