SOCCER: An Information-Sparse Discourse State Tracking Collection in the Sports Commentary Domain
Ruochen Zhang, Carsten Eickhoff

TL;DR
This paper introduces a new dataset and task for tracking in-game events in soccer commentary, highlighting the challenges of state recognition in loosely defined and sparse discourse settings.
Contribution
It presents a curated dataset of soccer match commentaries with annotated events and formulates a novel state tracking task in this domain.
Findings
Existing models struggle with broad state definitions.
The dataset enables analysis of state recognition in natural commentary.
Baseline methods show limited performance on the task.
Abstract
In the pursuit of natural language understanding, there has been a long standing interest in tracking state changes throughout narratives. Impressive progress has been made in modeling the state of transaction-centric dialogues and procedural texts. However, this problem has been less intensively studied in the realm of general discourse where ground truth descriptions of states may be loosely defined and state changes are less densely distributed over utterances. This paper proposes to turn to simplified, fully observable systems that show some of these properties: Sports events. We curated 2,263 soccer matches including time-stamped natural language commentary accompanied by discrete events such as a team scoring goals, switching players or being penalized with cards. We propose a new task formulation where, given paragraphs of commentary of a game at different timestamps, the system…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
