Reinforcing Video Reasoning with Focused Thinking

Jisheng Dang; Jingze Wu; Teng Wang; Xuanhui Lin; Nannan Zhu; Hongbo Chen; Wei-Shi Zheng; Meng Wang; Tat-Seng Chua

arXiv:2505.24718·cs.CV·June 10, 2025

Reinforcing Video Reasoning with Focused Thinking

Jisheng Dang, Jingze Wu, Teng Wang, Xuanhui Lin, Nannan Zhu, Hongbo Chen, Wei-Shi Zheng, Meng Wang, Tat-Seng Chua

PDF

1 Repo 1 Models

TL;DR

This paper introduces TW-GRPO, a reinforcement learning framework that improves video reasoning by focusing on salient information and utilizing dense rewards, leading to state-of-the-art results on multiple benchmarks.

Contribution

The paper proposes TW-GRPO, which enhances visual reasoning with token weighting, multi-choice rewards, and data augmentation, addressing key limitations of previous methods.

Findings

01

Achieves 50.4% accuracy on CLEVRER, outperforming previous models.

02

Improves MMVU accuracy to 65.8%.

03

Demonstrates effectiveness of focused reasoning and dense rewards.

Abstract

Recent advancements in reinforcement learning, particularly through Group Relative Policy Optimization (GRPO), have significantly improved multimodal large language models for complex reasoning tasks. However, two critical limitations persist: 1) they often produce unfocused, verbose reasoning chains that obscure salient spatiotemporal cues and 2) binary rewarding fails to account for partially correct answers, resulting in high reward variance and inefficient learning. In this paper, we propose TW-GRPO, a novel framework that enhances visual reasoning with focused thinking and dense reward granularity. Specifically, we employs a token weighting mechanism that prioritizes tokens with high informational density (estimated by intra-group information entropy), suppressing redundant tokens like generic reasoning prefixes. Furthermore, we reformulate RL training by shifting from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

longmalongma/tw-grpo
pytorchOfficial

Models

🤗
Falconss1/TW-GRPO
model· 97 dl
97 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.