Sports-QA: A Large-Scale Video Question Answering Benchmark for Complex and Professional Sports

Haopeng Li; Andong Deng; Jun Liu; Hossein Rahmani; Yulan Guo; Bernt Schiele; Mohammed Bennamoun; Qiuhong Ke

arXiv:2401.01505·cs.CV·January 6, 2026·2 cites

Sports-QA: A Large-Scale Video Question Answering Benchmark for Complex and Professional Sports

Haopeng Li, Andong Deng, Jun Liu, Hossein Rahmani, Yulan Guo, Bernt Schiele, Mohammed Bennamoun, Qiuhong Ke

PDF

Open Access 1 Repo

TL;DR

This paper introduces Sports-QA, a large-scale dataset for sports video question answering, and proposes the Auto-Focus Transformer model to improve understanding of complex sports videos, achieving state-of-the-art results.

Contribution

The paper presents the first sports-specific VideoQA dataset and a novel Auto-Focus Transformer model for fine-grained temporal reasoning in sports videos.

Findings

01

AFT achieves state-of-the-art performance on Sports-QA.

02

Sports-QA covers diverse question types and multiple sports.

03

Auto-Focus mechanism improves temporal information focus.

Abstract

Reasoning over sports videos for question answering is an important task with numerous applications, such as player training and information retrieval. However, this task has not been explored due to the lack of relevant datasets and the challenging nature it presents. Most datasets for video question answering (VideoQA) focus mainly on general and coarse-grained understanding of daily-life videos, which is not applicable to sports scenarios requiring professional action understanding and fine-grained motion analysis. In this paper, we introduce the first dataset, named Sports-QA, specifically designed for the sports VideoQA task. The Sports-QA dataset includes various types of questions, such as descriptions, chronologies, causalities, and counterfactual conditions, covering multiple sports. Furthermore, to address the characteristics of the sports VideoQA task, we propose a new…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hoplee6/sports-qa
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Video Analysis and Summarization · Human Pose and Action Recognition

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Byte Pair Encoding · Softmax · Label Smoothing · Adam · Dropout · Absolute Position Encodings · Layer Normalization