SportQA: A Benchmark for Sports Understanding in Large Language Models

Haotian Xia; Zhengbang Yang; Yuqing Wang; Rhys Tracy; Yun Zhao,; Dongdong Huang; Zezhi Chen; Yan Zhu; Yuan-fang Wang; Weining Shen

arXiv:2402.15862·cs.CL·June 19, 2024·1 cites

SportQA: A Benchmark for Sports Understanding in Large Language Models

Haotian Xia, Zhengbang Yang, Yuqing Wang, Rhys Tracy, Yun Zhao,, Dongdong Huang, Zezhi Chen, Yan Zhu, Yuan-fang Wang, Weining Shen

PDF

Open Access 1 Repo 1 Video

TL;DR

SportQA is a new benchmark with over 70,000 questions designed to evaluate and improve large language models' understanding of sports, from basic facts to complex reasoning, highlighting current limitations in LLMs' sports comprehension.

Contribution

We introduce SportQA, the first comprehensive sports knowledge benchmark for LLMs, enabling detailed assessment of their reasoning and understanding in sports contexts.

Findings

01

LLMs perform well on basic sports facts

02

LLMs struggle with complex, scenario-based sports reasoning

03

SportQA reveals gaps in current LLM capabilities

Abstract

A deep understanding of sports, a field rich in strategic and dynamic content, is crucial for advancing Natural Language Processing (NLP). This holds particular significance in the context of evaluating and advancing Large Language Models (LLMs), given the existing gap in specialized benchmarks. To bridge this gap, we introduce SportQA, a novel benchmark specifically designed for evaluating LLMs in the context of sports understanding. SportQA encompasses over 70,000 multiple-choice questions across three distinct difficulty levels, each targeting different aspects of sports knowledge from basic historical facts to intricate, scenario-based reasoning tasks. We conducted a thorough evaluation of prevalent LLMs, mainly utilizing few-shot learning paradigms supplemented by chain-of-thought (CoT) prompting. Our results reveal that while LLMs exhibit competent performance in basic sports…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

haotianxia/sportqa
noneOfficial

Videos

SportQA: A Benchmark for Sports Understanding in Large Language Models· underline

Taxonomy

TopicsNatural Language Processing Techniques · Video Analysis and Summarization