Commentary Generation for Soccer Highlights
Chidaksh Ravuru

TL;DR
This paper extends neural commentary generation models to soccer highlights, evaluating their performance on the GOAL dataset and analyzing factors affecting their accuracy and generalization.
Contribution
It adapts the MatchVoice model for short soccer highlight clips, provides extensive experimental analysis, and investigates the impact of training configurations and window sizes.
Findings
MatchVoice shows promising generalization to soccer highlights
Training configurations significantly affect performance
Varying window sizes impacts zero-shot capabilities
Abstract
Automated soccer commentary generation has evolved from template-based systems to advanced neural architectures, aiming to produce real-time descriptions of sports events. While frameworks like SoccerNet-Caption laid foundational work, their inability to achieve fine-grained alignment between video content and commentary remains a significant challenge. Recent efforts such as MatchTime, with its MatchVoice model, address this issue through coarse and fine-grained alignment techniques, achieving improved temporal synchronization. In this paper, we extend MatchVoice to commentary generation for soccer highlights using the GOAL dataset, which emphasizes short clips over entire games. We conduct extensive experiments to reproduce the original MatchTime results and evaluate our setup, highlighting the impact of different training configurations and hardware limitations. Furthermore, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Multimodal Machine Learning Applications · Human Pose and Action Recognition
