Enhancing Sports Strategy with Video Analytics and Data Mining: Assessing the effectiveness of Multimodal LLMs in tennis video analysis
Charlton Teo

TL;DR
This paper evaluates the effectiveness of Multimodal Large Language Models in analyzing tennis videos, aiming to improve event classification and sequence understanding in sports analytics.
Contribution
It introduces a novel assessment of MLLMs for tennis video analysis, focusing on event classification and sequence identification, addressing a key gap in sports analytics.
Findings
MLLMs can classify tennis actions with moderate accuracy.
Combining MLLMs with traditional models improves performance.
Training methods significantly impact MLLMs' effectiveness.
Abstract
The use of Large Language Models (LLMs) in recent years has also given rise to the development of Multimodal LLMs (MLLMs). These new MLLMs allow us to process images, videos and even audio alongside textual inputs. In this project, we aim to assess the effectiveness of MLLMs in analysing sports videos, focusing mainly on tennis videos. Despite research done on tennis analysis, there remains a gap in models that are able to understand and identify the sequence of events in a tennis rally, which would be useful in other fields of sports analytics. As such, we will mainly assess the MLLMs on their ability to fill this gap - to classify tennis actions, as well as their ability to identify these actions in a sequence of tennis actions in a rally. We further looked into ways we can improve the MLLMs' performance, including different training methods and even using them together with other…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
