GOAL: A Challenging Knowledge-grounded Video Captioning Benchmark for   Real-time Soccer Commentary Generation

Ji Qi; Jifan Yu; Teng Tu; Kunyu Gao; Yifan Xu; Xinyu Guan; Xiaozhi; Wang; Yuxiao Dong; Bin Xu; Lei Hou; Juanzi Li; Jie Tang; Weidong Guo; Hui; Liu; Yu Xu

arXiv:2303.14655·cs.CV·October 6, 2023·1 cites

GOAL: A Challenging Knowledge-grounded Video Captioning Benchmark for Real-time Soccer Commentary Generation

Ji Qi, Jifan Yu, Teng Tu, Kunyu Gao, Yifan Xu, Xinyu Guan, Xiaozhi, Wang, Yuxiao Dong, Bin Xu, Lei Hou, Juanzi Li, Jie Tang, Weidong Guo, Hui, Liu, Yu Xu

PDF

Open Access 1 Repo

TL;DR

GOAL introduces a new benchmark dataset for knowledge-grounded video captioning in soccer, emphasizing the challenge of generating detailed, knowledge-informed commentary for sports videos.

Contribution

The paper presents GOAL, a large-scale, challenging dataset for knowledge-grounded video captioning, and evaluates existing methods to highlight the task's difficulty and future directions.

Findings

01

Existing methods struggle with the task complexity.

02

The dataset enables research on knowledge integration in video captioning.

03

Baseline results indicate significant room for improvement.

Abstract

Despite the recent emergence of video captioning models, how to generate vivid, fine-grained video descriptions based on the background knowledge (i.e., long and informative commentary about the domain-specific scenes with appropriate reasoning) is still far from being solved, which however has great applications such as automatic sports narrative. In this paper, we present GOAL, a benchmark of over 8.9k soccer video clips, 22k sentences, and 42k knowledge triples for proposing a challenging new task setting as Knowledge-grounded Video Captioning (KGVC). Moreover, we conduct experimental adaption of existing methods to show the difficulty and potential directions for solving this valuable and applicable task. Our data and code are available at https://github.com/THU-KEG/goal.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

thu-keg/goal
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Video Analysis and Summarization · Human Pose and Action Recognition