SoccerChat: Integrating Multimodal Data for Enhanced Soccer Game Understanding
Sushant Gautam, Cise Midoglu, Vajira Thambawita, Michael A. Riegler, P{\aa}l Halvorsen, Mubarak Shah

TL;DR
SoccerChat is a multimodal AI framework that combines visual and textual data to improve soccer game understanding, event classification, and referee decision analysis, advancing sports analytics with interactive and explainable AI.
Contribution
We introduce SoccerChat, a novel multimodal conversational AI that integrates visual and textual data for comprehensive soccer video analysis, leveraging the SoccerNet dataset and fine-tuning on structured instructions.
Findings
Effective in classifying soccer events
Maintains competitive accuracy in referee decision tasks
Highlights the importance of multimodal data integration
Abstract
The integration of artificial intelligence in sports analytics has transformed soccer video understanding, enabling real-time, automated insights into complex game dynamics. Traditional approaches rely on isolated data streams, limiting their effectiveness in capturing the full context of a match. To address this, we introduce SoccerChat, a multimodal conversational AI framework that integrates visual and textual data for enhanced soccer video comprehension. Leveraging the extensive SoccerNet dataset, enriched with jersey color annotations and automatic speech recognition (ASR) transcripts, SoccerChat is fine-tuned on a structured video instruction dataset to facilitate accurate game understanding, event classification, and referee decision making. We benchmark SoccerChat on action classification and referee decision-making tasks, demonstrating its performance in general soccer event…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Human Pose and Action Recognition · Sports Analytics and Performance
