VideoTG-R1: Boosting Video Temporal Grounding via Curriculum Reinforcement Learning on Reflected Boundary Annotations

Lu Dong; Haiyu Zhang; Han Lin; Ziang Yan; Xiangyu Zeng; Hongjie Zhang; Yifei Huang; Yi Wang; Zhen-Hua Ling; Limin Wang; and Yali Wang

arXiv:2510.23397·cs.CV·October 28, 2025

VideoTG-R1: Boosting Video Temporal Grounding via Curriculum Reinforcement Learning on Reflected Boundary Annotations

Lu Dong, Haiyu Zhang, Han Lin, Ziang Yan, Xiangyu Zeng, Hongjie Zhang, Yifei Huang, Yi Wang, Zhen-Hua Ling, Limin Wang, and Yali Wang

PDF

1 Models

TL;DR

VideoTG-R1 introduces a curriculum reinforcement learning framework with reflected boundary annotations to improve video temporal grounding, especially with limited data and computational resources.

Contribution

It proposes a novel curriculum RL approach with boundary reflection and difficulty estimation agents to enhance data efficiency and training effectiveness in VTG.

Findings

01

Outperforms full-data models using only 10% of training samples.

02

Reduces training time by 79% compared to full-data training.

03

Effective on VTG and grounded VideoQA tasks.

Abstract

Video temporal grounding (VTG) aims to locate precise segments in videos based on language queries, which is a fundamental challenge in video understanding. While recent Multimodal Large Language Models (MLLMs) have shown promise in tackling VTG through reinforcement learning (RL), they overlook the challenges arising from both the quality and difficulty of training samples. (1) Partially annotated samples. Many samples contain relevant segments beyond the annotated interval, introducing ambiguous supervision. (2) Hard-to-ground samples. Samples with poor zero-shot performance produce consistently low and indistinguishable rewards during RL training, exhibiting no clear preference among multiple outputs and thus hindering learning efficiency. To address these challenges, we propose VideoTG-R1, a novel curriculum RL framework with reflected boundary annotations, enabling data-efficient…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
Lu9876/VideoTG_R1
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.