Reinforcing Video Reasoning Segmentation to Think Before It Segments

Sitong Gong; Lu Zhang; Yunzhi Zhuge; Xu Jia; Pingping Zhang; Huchuan Lu

arXiv:2508.11538·cs.CV·March 5, 2026

Reinforcing Video Reasoning Segmentation to Think Before It Segments

Sitong Gong, Lu Zhang, Yunzhi Zhuge, Xu Jia, Pingping Zhang, Huchuan Lu

PDF

Open Access

TL;DR

This paper introduces Veason-R1, a reinforcement learning-based LVLM designed for video reasoning segmentation, which improves interpretability, spatiotemporal reasoning, and achieves state-of-the-art results on multiple benchmarks.

Contribution

The paper presents Veason-R1, a novel LVLM for VRS that incorporates structured reasoning via Chain-of-Thought training and Group Relative Policy Optimization, enhancing performance and interpretability.

Findings

01

Achieves +1.3 J &F on ReVOS and +10.0 J &F on ReasonVOS benchmarks.

02

Demonstrates robustness to hallucinations with +8.8 R improvement.

03

Outperforms prior methods significantly in video reasoning segmentation.

Abstract

Video reasoning segmentation (VRS) endeavors to delineate referred objects in videos guided by implicit instructions that encapsulate human intent and temporal logic. Previous approaches leverage large vision language models (LVLMs) to encode object semantics into <SEG> tokens for mask prediction. However, this paradigm suffers from limited interpretability during inference and suboptimal performance due to inadequate spatiotemporal reasoning. Drawing inspiration from seminal breakthroughs in reinforcement learning, we introduce Veason-R1, a specialized LVLM for VRS that emphasizes structured reasoning in segmentation. Veason-R1 is trained through Group Relative Policy Optimization (GRPO) augmented with Chain-of-Thought (CoT) initialization. To begin with, we curate high-quality CoT training data to instill structured reasoning trajectories, bridging video-level semantics and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Reinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI)