VIDEOP2R: Video Understanding from Perception to Reasoning

Yifan Jiang; Yueying Wang; Rui Zhao; Toufiq Parag; Zhimin Chen; Zhenyu Liao; Jayakrishnan Unnikrishnan

arXiv:2511.11113·cs.CV·April 21, 2026

VIDEOP2R: Video Understanding from Perception to Reasoning

Yifan Jiang, Yueying Wang, Rui Zhao, Toufiq Parag, Zhimin Chen, Zhenyu Liao, Jayakrishnan Unnikrishnan

PDF

1 Repo

TL;DR

VideoP2R introduces a process-aware reinforcement fine-tuning framework for large video language models, significantly improving reasoning capabilities by modeling perception and reasoning separately and achieving state-of-the-art results.

Contribution

It develops a novel process-aware video RFT framework with a new dataset and a specialized policy optimization algorithm for enhanced video reasoning.

Findings

01

Achieves SOTA on 6 out of 7 benchmarks.

02

The perception output is sufficient for downstream reasoning.

03

The process-aware modeling and PA-GRPO improve performance.

Abstract

Reinforcement fine-tuning (RFT), a two-stage framework consisting of supervised fine-tuning (SFT) and reinforcement learning (RL) has shown promising results on improving reasoning ability of large language models (LLMs). Yet extending RFT to large video language models (LVLMs) remains challenging. We propose VideoP2R, a novel process-aware video RFT framework that enhances video reasoning by modeling perception and reasoning as distinct processes. In the SFT stage, we develop a three-step pipeline to generate VideoP2R-CoT-162K, a high-quality, process-aware chain-of-thought (CoT) dataset for perception and reasoning. In the RL stage, we introduce a novel process-aware group relative policy optimization (PA-GRPO) algorithm that supplies separate rewards for perception and reasoning. Extensive experiments show that VideoP2R achieves state-of-the-art (SotA) performance on six out of seven…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://videop2r.github.io/videop2r
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.