LiveQA: A Question Answering Dataset over Sports Live
Qianying Liu, Sicong Jiang, Yizhong Wang, Sujian Li

TL;DR
LiveQA is a large, challenging dataset for question answering based on live sports broadcasts, requiring timeline understanding, event tracking, and calculations, with models currently performing poorly.
Contribution
We introduce LiveQA, a novel dataset from live NBA broadcasts that tests reasoning over timeline-based sports events, filling a gap in existing QA datasets.
Findings
Baseline models achieve only 53.1% accuracy.
The dataset challenges current question answering models.
LiveQA enables future research in reasoning over live sports data.
Abstract
In this paper, we introduce LiveQA, a new question answering dataset constructed from play-by-play live broadcast. It contains 117k multiple-choice questions written by human commentators for over 1,670 NBA games, which are collected from the Chinese Hupu (https://nba.hupu.com/games) website. Derived from the characteristics of sports games, LiveQA can potentially test the reasoning ability across timeline-based live broadcasts, which is challenging compared to the existing datasets. In LiveQA, the questions require understanding the timeline, tracking events or doing mathematical computations. Our preliminary experiments show that the dataset introduces a challenging problem for question answering models, and a strong baseline model only achieves the accuracy of 53.1\% and cannot beat the dominant option rule. We release the code and data of this paper for future research.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
