SoccerNet-Echoes: A Soccer Game Audio Commentary Dataset
Sushant Gautam, Mehdi Houshmand Sarkhoosh, Jan Held, Cise Midoglu, Anthony Cioppa, Silvio Giancola, Vajira Thambawita, Michael A. Riegler, P{\aa}l Halvorsen, Mubarak Shah

TL;DR
SoccerNet-Echoes is a new dataset that combines soccer game videos with automatically generated transcriptions of audio commentary, enabling advanced sports analytics and applications like highlight generation and game summarization.
Contribution
This paper introduces SoccerNet-Echoes, augmenting the SoccerNet dataset with ASR-generated transcriptions using Whisper and translation with Google Translate, facilitating multimodal sports analysis.
Findings
Enhanced dataset with audio transcriptions improves action spotting.
Supports automatic captioning and game summarization.
Enables multimodal sports analytics research.
Abstract
The application of Automatic Speech Recognition (ASR) technology in soccer offers numerous opportunities for sports analytics. Specifically, extracting audio commentaries with ASR provides valuable insights into the events of the game, and opens the door to several downstream applications such as automatic highlight generation. This paper presents SoccerNet-Echoes, an augmentation of the SoccerNet dataset with automatically generated transcriptions of audio commentaries from soccer game broadcasts, enhancing video content with rich layers of textual information derived from the game audio using ASR. These textual commentaries, generated using the Whisper model and translated with Google Translate, extend the usefulness of the SoccerNet dataset in diverse applications such as enhanced action spotting, automatic caption generation, and game summarization. By incorporating textual data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Sports Analytics and Performance · Video Analysis and Summarization
