BoxComm: Benchmarking Category-Aware Commentary Generation and Narration Rhythm in Boxing

Kaiwen Wang; Kaili Zheng; Rongrong Deng; Yiming Shi; Chenyi Guo; Ji Wu

arXiv:2604.04419·cs.CV·April 7, 2026

BoxComm: Benchmarking Category-Aware Commentary Generation and Narration Rhythm in Boxing

Kaiwen Wang, Kaili Zheng, Rongrong Deng, Yiming Shi, Chenyi Guo, Ji Wu

PDF

2 Datasets

TL;DR

This paper introduces BoxComm, a new boxing video dataset with professional commentary, and proposes novel evaluation methods for category-aware commentary generation and rhythm assessment, revealing current models' limitations.

Contribution

The paper presents the first category-level annotation for combat sports commentary and develops two evaluation metrics tailored to boxing commentary generation.

Findings

01

Current models struggle with category-conditioned generation.

02

Structured action cues improve commentary quality.

03

BoxComm provides a new benchmark for combat sports commentary.

Abstract

Recent multimodal large language models (MLLMs) have shown strong capabilities in general video understanding, driving growing interest in automatic sports commentary generation. However, existing benchmarks for this task focus exclusively on team sports such as soccer and basketball, leaving combat sports entirely unexplored. Notably, combat sports present distinct challenges: critical actions unfold within milliseconds with visually subtle yet semantically decisive differences, and professional commentary contains a substantially higher proportion of tactical analysis compared to team sports. In this paper, we present BoxComm, a large-scale dataset comprising 445 World Boxing Championship match videos with over 52K commentary sentences from professional broadcasts. We propose a structured commentary taxonomy that categorizes each sentence into play-by-play, tactical, or contextual,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.