Seeing the Arrow of Time in Large Multimodal Models

Zihui Xue; Mi Luo; Kristen Grauman

arXiv:2506.03340·cs.CV·October 27, 2025

Seeing the Arrow of Time in Large Multimodal Models

Zihui Xue, Mi Luo, Kristen Grauman

PDF

Open Access

TL;DR

This paper introduces ArrowRL, a reinforcement learning strategy that enhances large multimodal models' ability to understand the arrow of time in videos, significantly improving temporal comprehension and question answering accuracy.

Contribution

The paper proposes ArrowRL, a novel RL-based training method with reverse rewards, and introduces AoTBench, a new benchmark for evaluating temporal understanding in video models.

Findings

01

ArrowRL improves temporal perception in LMMs.

02

Significant accuracy gains on AoTBench and VQA benchmarks.

03

Highlights the importance of AoT understanding in video models.

Abstract

The Arrow of Time (AoT)-time's irreversible flow shaping physical events-is fundamental to video comprehension, yet remains a significant challenge for modern large multimodal models (LMMs). Current LMMs struggle to perceive and utilize temporal directionality in video when responding to language queries, obstructing deeper temporal understanding. We tackle this deficiency by first providing a critical analysis of existing benchmarks and models. We then introduce ArrowRL, a reinforcement learning (RL)-based training strategy with an innovative reverse reward that instills AoT awareness by encouraging divergent video interpretations between forward and reversed visual frames. For rigorous evaluation, we additionally develop AoTBench, a new multi-faceted benchmark probing temporally challenging questions. Experiments show ArrowRL greatly advances temporal perception: it not only achieves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning