Learning Spatiotemporal Sensitivity in Video LLMs via Counterfactual Reinforcement Learning

Dazhao Du; Jian Liu; Jialong Qin; Tao Han; Bohai Gu; Fangqi Zhu; Yujia Zhang; Eric Liu; Xi Chen; Song Guo

arXiv:2605.21988·cs.CV·May 22, 2026

Learning Spatiotemporal Sensitivity in Video LLMs via Counterfactual Reinforcement Learning

Dazhao Du, Jian Liu, Jialong Qin, Tao Han, Bohai Gu, Fangqi Zhu, Yujia Zhang, Eric Liu, Xi Chen, Song Guo

PDF

1 Repo 2 Models 1 Datasets

TL;DR

This paper introduces CRPO, a reinforcement learning framework that enhances spatiotemporal sensitivity in Video LLMs by using counterfactual videos and a novel reward, evaluated on a new benchmark.

Contribution

The paper proposes a dual-branch RL method with counterfactual data augmentation and a relation reward to improve spatiotemporal understanding in Video LLMs.

Findings

01

CRPO outperforms prior RL methods on spatiotemporal-sensitive benchmarks.

02

CRPO improves model sensitivity to dynamic video aspects without sacrificing static performance.

03

The DyBench benchmark effectively measures spatiotemporal sensitivity in videos.

Abstract

Video large language models (Video LLMs) achieve strong benchmark accuracy, yet often answer video questions through shortcuts such as single-frame cues and language priors rather than by tracking spatiotemporal dynamics. This issue is exacerbated in RL post-training, where correctness-only rewards can further reinforce shortcut policies that obtain high reward without tracking video dynamics. We address this by asking a controlled counterfactual question: if the visual world changed while the question remained fixed, should the answer change or stay the same? Based on this view, we propose \textbf{Counterfactual Relational Policy Optimization (CRPO)}, a dual-branch RL framework for improving \emph{spatiotemporal sensitivity}. CRPO constructs counterfactual videos through horizontal flips and temporal reversals, trains on both original and counterfactual branches, and introduces a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://ddz16.github.io/crpo.github.io
github

Models

Datasets

ddz16/DyBench
dataset· 37 dl
37 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.