Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models

Jiaqi Wang; Kevin Qinghong Lin; James Cheng; Mike Zheng Shou

arXiv:2505.16854·cs.AI·October 30, 2025

Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models

Jiaqi Wang, Kevin Qinghong Lin, James Cheng, Mike Zheng Shou

PDF

1 Repo 4 Models 2 Datasets

TL;DR

This paper introduces TON, a two-stage training strategy enabling vision-language models to selectively decide when to reason, significantly reducing unnecessary reasoning steps while maintaining or improving task performance.

Contribution

TON is the first method to incorporate a think-or-not decision process into RL-based reasoning for VLMs, reducing reasoning length by up to 90% without performance loss.

Findings

01

TON reduces reasoning steps by up to 90%.

02

Models learn to bypass unnecessary reasoning as training progresses.

03

TON improves reasoning efficiency across multiple tasks and models.

Abstract

Reinforcement Learning (RL) has proven to be an effective post-training strategy for enhancing reasoning in vision-language models (VLMs). Group Relative Policy Optimization (GRPO) is a recent prominent method that encourages models to generate complete reasoning traces before answering, leading to increased token usage and computational cost. Inspired by the human-like thinking process-where people skip reasoning for easy questions but think carefully when needed-we explore how to enable VLMs to first decide when reasoning is necessary. To realize this, we propose TON, a two-stage training strategy: (i) a supervised fine-tuning (SFT) stage with a simple yet effective 'thought dropout' operation, where reasoning traces are randomly replaced with empty thoughts. This introduces a think-or-not format that serves as a cold start for selective reasoning; (ii) a GRPO stage that enables the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kokolerk/ton
jaxOfficial

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.