EasyVideoR1: Easier RL for Video Understanding

Chuanyu Qin; Chenxu Yang; Qingyi Si; Naibin Gu; Dingyu Yao; Zheng Lin; Peng Fu; Nan Duan; Jiaqi Wang

arXiv:2604.16893·cs.CV·April 21, 2026

EasyVideoR1: Easier RL for Video Understanding

Chuanyu Qin, Chenxu Yang, Qingyi Si, Naibin Gu, Dingyu Yao, Zheng Lin, Peng Fu, Nan Duan, Jiaqi Wang

PDF

1 Repo

TL;DR

EasyVideoR1 is a specialized reinforcement learning framework that significantly improves training efficiency and evaluation for large vision-language models on diverse video understanding tasks.

Contribution

It introduces a complete, optimized pipeline with task-aware rewards, hybrid data training, and multi-benchmark evaluation tailored for video RL.

Findings

01

1. Achieves 1.47× throughput improvement through offline preprocessing and tensor caching.

02

2. Supports 11 video/image problem types with unified reward routing.

03

3. Reproduces benchmark scores closely aligned with official results.

Abstract

Reinforcement learning from verifiable rewards (RLVR) has demonstrated remarkable effectiveness in improving the reasoning capabilities of large language models. As models evolve into natively multimodal architectures, extending RLVR to video understanding becomes increasingly important yet remains largely unexplored, due to the diversity of video task types, the computational overhead of repeatedly decoding and preprocessing high-dimensional visual inputs, and the difficulty of reproducible evaluation across numerous sensitive hyperparameters. Existing open-source RL training frameworks provide solid infrastructure for text and image scenarios but lack systematic optimizations tailored for video modality. In this work, we present \textbf{EasyVideoR1}, a complete and efficient reinforcement learning framework specifically designed for training large vision-language models on video…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cyuq1n/EasyVideoR1
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.