ReXTime: A Benchmark Suite for Reasoning-Across-Time in Videos

Jr-Jen Chen; Yu-Chien Liao; Hsi-Che Lin; Yu-Chu Yu; Yen-Chun Chen,; Yu-Chiang Frank Wang

arXiv:2406.19392·cs.CV·July 3, 2024

ReXTime: A Benchmark Suite for Reasoning-Across-Time in Videos

Jr-Jen Chen, Yu-Chien Liao, Hsi-Che Lin, Yu-Chu Yu, Yen-Chun Chen,, Yu-Chiang Frank Wang

PDF

Open Access 1 Repo 1 Datasets 1 Video

TL;DR

ReXTime is a new benchmark suite designed to evaluate AI models' ability to perform temporal reasoning across video segments, highlighting current limitations and providing a dataset for improving such reasoning.

Contribution

The paper introduces ReXTime, a novel benchmark with an automated pipeline for generating temporal reasoning questions, enabling large-scale evaluation and training of models on reasoning across video segments.

Findings

01

Frontier large language models outperform academic models but still lag human performance by 14.3%.

02

The automated dataset generation pipeline effectively creates training data for across-time reasoning.

03

Empirical results show fine-tuning on generated data improves model performance.

Abstract

We introduce ReXTime, a benchmark designed to rigorously test AI models' ability to perform temporal reasoning within video events. Specifically, ReXTime focuses on reasoning across time, i.e. human-like understanding when the question and its corresponding answer occur in different video segments. This form of reasoning, requiring advanced understanding of cause-and-effect relationships across video segments, poses significant challenges to even the frontier multimodal large language models. To facilitate this evaluation, we develop an automated pipeline for generating temporal reasoning question-answer pairs, significantly reducing the need for labor-intensive manual annotations. Our benchmark includes 921 carefully vetted validation samples and 2,143 test samples, each manually curated for accuracy and relevance. Evaluation results show that while frontier large language models…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rextime/rextime
noneOfficial

Datasets

ReXTime/ReXTime
dataset· 174 dl
174 dl

Videos

ReXTime: A Benchmark Suite for Reasoning-Across-Time in Videos· slideslive

Taxonomy

TopicsHuman Pose and Action Recognition · Video Analysis and Summarization