SAIL-RL: Guiding MLLMs in When and How to Think via Dual-Reward RL Tuning

Fangxun Shu; Yongjie Ye; Yue Liao; Zijian Kang; Weijie Yin; Jiacong Wang; Xiao Liang; Shuicheng Yan; Chao Feng

arXiv:2511.02280·cs.CV·February 4, 2026

SAIL-RL: Guiding MLLMs in When and How to Think via Dual-Reward RL Tuning

Fangxun Shu, Yongjie Ye, Yue Liao, Zijian Kang, Weijie Yin, Jiacong Wang, Xiao Liang, Shuicheng Yan, Chao Feng

PDF

Open Access

TL;DR

SAIL-RL is a reinforcement learning framework that improves multimodal large language models' reasoning by teaching them when and how to think, using a dual reward system for better adaptability and reliability.

Contribution

It introduces a dual reward RL tuning method that enhances reasoning and adaptability in multimodal large language models, addressing limitations of previous outcome-only supervision.

Findings

01

Improves reasoning benchmarks at 4B and 8B scales

02

Reduces hallucinations significantly

03

Achieves competitive performance with GPT-4o

Abstract

We introduce SAIL-RL, a reinforcement learning (RL) post-training framework that enhances the reasoning capabilities of multimodal large language models (MLLMs) by teaching them when and how to think. Existing approaches are limited by outcome-only supervision, which rewards correct answers without ensuring sound reasoning, and by uniform thinking strategies, which often lead to overthinking on simple tasks and underthinking on complex ones. SAIL-RL addresses these challenges with a dual reward system: the Thinking Reward, which evaluates reasoning quality through factual grounding, logical coherence, and answer consistency, and the Judging Reward, which adaptively determines whether deep reasoning or direct answering is appropriate. Experiments on the state-of-the-art SAIL-VL2 show that SAIL-RL improves reasoning and multimodal understanding benchmarks at both 4B and 8B scales,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Explainable Artificial Intelligence (XAI)