TimeLogic: A Temporal Logic Benchmark for Video QA
Sirnam Swetha, Hilde Kuehne, Mubarak Shah

TL;DR
This paper introduces TimeLogic, a benchmark for evaluating temporal logical reasoning in VideoQA models, using automatically generated QA pairs from existing datasets to assess understanding of event sequences and their temporal relationships.
Contribution
The paper presents a scalable, automatic framework for generating temporal logical QA pairs, enabling comprehensive benchmarking of VideoQA models' temporal reasoning abilities.
Findings
Existing VideoQA models show limited temporal logical reasoning capabilities.
The TLQA benchmark covers 16 categories of temporal logic with varying complexity.
Large-scale datasets with up to 160k QA pairs are created for robust evaluation.
Abstract
Temporal logical understanding, a core facet of human cognition, plays a pivotal role in capturing complex sequential events and their temporal relationships within videos. This capability is particularly crucial in tasks like Video Question Answering (VideoQA), where the goal is to process visual data over time together with textual data to provide coherent answers. However, current VideoQA benchmarks devote little focus to evaluating this critical skill due to the challenge of annotating temporal logic. Despite the advancement of vision-language models, assessing their temporal logical reasoning powers remains a challenge, primarily due to the lack QA pairs that demand formal, complex temporal reasoning. To bridge this gap, we introduce the TimeLogic QA (TLQA) framework to automatically generate the QA pairs, specifically designed to evaluate the temporal logical understanding. To…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsFocus
