Commentary Generation for Soccer Highlights

Chidaksh Ravuru

arXiv:2508.07543·cs.CV·August 12, 2025

Commentary Generation for Soccer Highlights

Chidaksh Ravuru

PDF

Open Access

TL;DR

This paper extends neural commentary generation models to soccer highlights, evaluating their performance on the GOAL dataset and analyzing factors affecting their accuracy and generalization.

Contribution

It adapts the MatchVoice model for short soccer highlight clips, provides extensive experimental analysis, and investigates the impact of training configurations and window sizes.

Findings

01

MatchVoice shows promising generalization to soccer highlights

02

Training configurations significantly affect performance

03

Varying window sizes impacts zero-shot capabilities

Abstract

Automated soccer commentary generation has evolved from template-based systems to advanced neural architectures, aiming to produce real-time descriptions of sports events. While frameworks like SoccerNet-Caption laid foundational work, their inability to achieve fine-grained alignment between video content and commentary remains a significant challenge. Recent efforts such as MatchTime, with its MatchVoice model, address this issue through coarse and fine-grained alignment techniques, achieving improved temporal synchronization. In this paper, we extend MatchVoice to commentary generation for soccer highlights using the GOAL dataset, which emphasizes short clips over entire games. We conduct extensive experiments to reproduce the original MatchTime results and evaluate our setup, highlighting the impact of different training configurations and hardware limitations. Furthermore, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Multimodal Machine Learning Applications · Human Pose and Action Recognition