Metis-RISE: RL Incentivizes and SFT Enhances Multimodal Reasoning Model Learning
Haibo Qiu, Xiaohan Lan, Fanfan Liu, Xiaohu Sun, Delian Ruan, Peng Shi, Lin Ma

TL;DR
Metis-RISE introduces a novel training paradigm for multimodal reasoning models that combines reinforcement learning without initial supervised fine-tuning, followed by targeted SFT, resulting in state-of-the-art performance.
Contribution
The paper proposes a new training approach that omits initial SFT, using RL to activate reasoning capabilities, then applying SFT to address sampling inefficiencies and capability gaps.
Findings
Achieves state-of-the-art results on the OpenCompass Multimodal Reasoning Leaderboard.
Demonstrates effectiveness of RL incentivization without initial SFT.
Shows improved reasoning performance in 7B and 72B parameter models.
Abstract
Recent advancements in large language models (LLMs) have witnessed a surge in the development of advanced reasoning paradigms, which are now being integrated into multimodal large language models (MLLMs). However, existing approaches often fall short: methods solely employing reinforcement learning (RL) can struggle with sample inefficiency and activating entirely absent reasoning capabilities, while conventional pipelines that initiate with a cold-start supervised fine-tuning (SFT) phase before RL may restrict the model's exploratory capacity and face suboptimal convergence. In this work, we introduce \textbf{Metis-RISE} (\textbf{R}L \textbf{I}ncentivizes and \textbf{S}FT \textbf{E}nhances) for multimodal reasoning model learning. Unlike conventional approaches, Metis-RISE distinctively omits an initial SFT stage, beginning instead with an RL phase (e.g., using a Group Relative Policy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
MethodsShrink and Fine-Tune
