SPORTU: A Comprehensive Sports Understanding Benchmark for Multimodal   Large Language Models

Haotian Xia; Zhengbang Yang; Junbo Zou; Rhys Tracy; Yuqing Wang; Chi; Lu; Christopher Lai; Yanjun He; Xun Shao; Zhuoqing Xie; Yuan-fang Wang,; Weining Shen; Hanjie Chen

arXiv:2410.08474·cs.CV·March 18, 2025·2 cites

SPORTU: A Comprehensive Sports Understanding Benchmark for Multimodal Large Language Models

Haotian Xia, Zhengbang Yang, Junbo Zou, Rhys Tracy, Yuqing Wang, Chi, Lu, Christopher Lai, Yanjun He, Xun Shao, Zhuoqing Xie, Yuan-fang Wang,, Weining Shen, Hanjie Chen

PDF

Open Access 1 Repo 1 Video

TL;DR

SPORTU is a new benchmark designed to evaluate multimodal large language models' reasoning abilities in sports, covering rule comprehension, strategy, and complex video-based tasks, highlighting current models' limitations.

Contribution

Introduces SPORTU, a comprehensive sports understanding benchmark with textual and video components, enabling detailed evaluation of MLLMs' reasoning in sports scenarios.

Findings

01

GPT-4o achieves 71% accuracy on text questions.

02

Models struggle with complex reasoning and rule-based tasks.

03

Claude-3.5-Sonnet achieves 52.6% on hard video tasks.

Abstract

Multimodal Large Language Models (MLLMs) are advancing the ability to reason about complex sports scenarios by integrating textual and visual information. To comprehensively evaluate their capabilities, we introduce SPORTU, a benchmark designed to assess MLLMs across multi-level sports reasoning tasks. SPORTU comprises two key components: SPORTU-text, featuring 900 multiple-choice questions with human-annotated explanations for rule comprehension and strategy understanding. This component focuses on testing models' ability to reason about sports solely through question-answering (QA), without requiring visual inputs; SPORTU-video, consisting of 1,701 slow-motion video clips across 7 different sports and 12,048 QA pairs, designed to assess multi-level reasoning, from simple sports recognition to complex tasks like foul detection and rule application. We evaluate four prevalent LLMs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

haotianxia/SPORTU
noneOfficial

Videos

SPORTU: A Comprehensive Sports Understanding Benchmark for Multimodal Large Language Models· slideslive

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling