Metis-RISE: RL Incentivizes and SFT Enhances Multimodal Reasoning Model Learning

Haibo Qiu; Xiaohan Lan; Fanfan Liu; Xiaohu Sun; Delian Ruan; Peng Shi; Lin Ma

arXiv:2506.13056·cs.AI·June 27, 2025

Metis-RISE: RL Incentivizes and SFT Enhances Multimodal Reasoning Model Learning

Haibo Qiu, Xiaohan Lan, Fanfan Liu, Xiaohu Sun, Delian Ruan, Peng Shi, Lin Ma

PDF

Open Access 1 Repo 2 Models

TL;DR

Metis-RISE introduces a novel training paradigm for multimodal reasoning models that combines reinforcement learning without initial supervised fine-tuning, followed by targeted SFT, resulting in state-of-the-art performance.

Contribution

The paper proposes a new training approach that omits initial SFT, using RL to activate reasoning capabilities, then applying SFT to address sampling inefficiencies and capability gaps.

Findings

01

Achieves state-of-the-art results on the OpenCompass Multimodal Reasoning Leaderboard.

02

Demonstrates effectiveness of RL incentivization without initial SFT.

03

Shows improved reasoning performance in 7B and 72B parameter models.

Abstract

Recent advancements in large language models (LLMs) have witnessed a surge in the development of advanced reasoning paradigms, which are now being integrated into multimodal large language models (MLLMs). However, existing approaches often fall short: methods solely employing reinforcement learning (RL) can struggle with sample inefficiency and activating entirely absent reasoning capabilities, while conventional pipelines that initiate with a cold-start supervised fine-tuning (SFT) phase before RL may restrict the model's exploratory capacity and face suboptimal convergence. In this work, we introduce \textbf{Metis-RISE} (\textbf{R}L \textbf{I}ncentivizes and \textbf{S}FT \textbf{E}nhances) for multimodal reasoning model learning. Unlike conventional approaches, Metis-RISE distinctively omits an initial SFT stage, beginning instead with an RL phase (e.g., using a Group Relative Policy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mm-thinking/metis-rise
noneOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques

MethodsShrink and Fine-Tune