Language and Multimodal Models in Sports: A Survey of Datasets and Applications
Haotian Xia, Zhengbang Yang, Yun Zhao, Yuqing Wang, Jingxi Li, Rhys, Tracy, Zhuangdi Zhu, Yuan-fang Wang, Hanjie Chen, Weining Shen

TL;DR
This survey reviews recent datasets and applications integrating NLP and multimodal models in sports analytics, highlighting dataset types, contributions to various applications, and future research directions.
Contribution
It categorizes sports datasets into language-based, multimodal, and convertible types, and discusses their roles in advancing sports analytics applications.
Findings
Datasets are categorized into three primary types: language-based, multimodal, and convertible.
Multimodal datasets enable richer applications like tactical analysis and medical diagnostics.
Future datasets should focus on diversity, quality, and real-time processing capabilities.
Abstract
Recent integration of Natural Language Processing (NLP) and multimodal models has advanced the field of sports analytics. This survey presents a comprehensive review of the datasets and applications driving these innovations post-2020. We overviewed and categorized datasets into three primary types: language-based, multimodal, and convertible datasets. Language-based and multimodal datasets are for tasks involving text or multimodality (e.g., text, video, audio), respectively. Convertible datasets, initially single-modal (video), can be enriched with additional annotations, such as explanations of actions and video descriptions, to become multimodal, offering future potential for richer and more diverse applications. Our study highlights the contributions of these datasets to various applications, from improving fan experiences to supporting tactical analysis and medical diagnostics. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
