SPORTU: A Comprehensive Sports Understanding Benchmark for Multimodal Large Language Models
Haotian Xia, Zhengbang Yang, Junbo Zou, Rhys Tracy, Yuqing Wang, Chi, Lu, Christopher Lai, Yanjun He, Xun Shao, Zhuoqing Xie, Yuan-fang Wang,, Weining Shen, Hanjie Chen

TL;DR
SPORTU is a new benchmark designed to evaluate multimodal large language models' reasoning abilities in sports, covering rule comprehension, strategy, and complex video-based tasks, highlighting current models' limitations.
Contribution
Introduces SPORTU, a comprehensive sports understanding benchmark with textual and video components, enabling detailed evaluation of MLLMs' reasoning in sports scenarios.
Findings
GPT-4o achieves 71% accuracy on text questions.
Models struggle with complex reasoning and rule-based tasks.
Claude-3.5-Sonnet achieves 52.6% on hard video tasks.
Abstract
Multimodal Large Language Models (MLLMs) are advancing the ability to reason about complex sports scenarios by integrating textual and visual information. To comprehensively evaluate their capabilities, we introduce SPORTU, a benchmark designed to assess MLLMs across multi-level sports reasoning tasks. SPORTU comprises two key components: SPORTU-text, featuring 900 multiple-choice questions with human-annotated explanations for rule comprehension and strategy understanding. This component focuses on testing models' ability to reason about sports solely through question-answering (QA), without requiring visual inputs; SPORTU-video, consisting of 1,701 slow-motion video clips across 7 different sports and 12,048 QA pairs, designed to assess multi-level reasoning, from simple sports recognition to complex tasks like foul detection and rule application. We evaluate four prevalent LLMs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
