Towards Temporal Compositional Reasoning in Long-Form Sports Videos

Siyu Cao; Lu Zhang; Ruizhe Zeng; and Zhi-yong Liu

arXiv:2604.22226·cs.CV·April 27, 2026

Towards Temporal Compositional Reasoning in Long-Form Sports Videos

Siyu Cao, Lu Zhang, Ruizhe Zeng, and Zhi-yong Liu

PDF

TL;DR

This paper introduces SportsTime, a large-scale benchmark for long-form sports video understanding, and proposes Chain-of-Time Reasoning (CoTR), a method that improves temporal reasoning and evidence grounding in multimodal models.

Contribution

The paper presents a new benchmark dataset SportsTime and a novel reasoning method CoTR that enhances temporal compositional reasoning in sports videos.

Findings

01

CoTR improves temporal reasoning accuracy over baselines.

02

SportsTime enables better evaluation of long-horizon reasoning in sports videos.

03

CoTR enhances step-wise evidence grounding quality.

Abstract

Sports videos are a challenging domain for multimodal understanding because they involve complex and dynamic human activities. Despite rapid progress in Multimodal Large Language Models (MLLMs), long-horizon reasoning in sports videos remains difficult, as answering questions requires both locating temporally sparse evidence and integrating it into reasoning. We attribute this limitation to two closely coupled factors: insufficient supervision over temporally dispersed evidence, and the lack of methods that require models to identify, localize, and justify temporal evidence. To address these gaps, we introduce SportsTime, a large-scale benchmark for long-form sports video understanding, comprising 14K+ open-ended QA pairs and 50K+ step-wise temporal evidence annotations. Building on SportsTime, we propose Chain-of-Time Reasoning (CoTR), which treats reasoning as a process of temporally…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.