Player-Centric Multimodal Prompt Generation for Large Language Model Based Identity-Aware Basketball Video Captioning

Zeyu Xi; Haoying Sun; Yaofei Wu; Junchi Yan; Haoran Zhang; Lifang Wu; Liang Wang; Changwen Chen

arXiv:2507.20163·cs.CV·July 29, 2025

Player-Centric Multimodal Prompt Generation for Large Language Model Based Identity-Aware Basketball Video Captioning

Zeyu Xi, Haoying Sun, Yaofei Wu, Junchi Yan, Haoran Zhang, Lifang Wu, Liang Wang, Changwen Chen

PDF

TL;DR

This paper introduces a player-centric multimodal prompt generation network for basketball video captioning that accurately recognizes player identities and improves description quality, supported by a new large-scale NBA-Identity dataset.

Contribution

The paper presents a novel multimodal prompt generation approach focusing on player identity recognition, integrating visual and semantic features for improved sports video captioning.

Findings

01

Achieves state-of-the-art performance on NBA-Identity and VC-NBA-2022 datasets.

02

Constructs a large-scale NBA-Identity dataset with 9,726 videos.

03

Demonstrates effective integration of multimodal features for identity-aware captioning.

Abstract

Existing sports video captioning methods often focus on the action yet overlook player identities, limiting their applicability. Although some methods integrate extra information to generate identity-aware descriptions, the player identities are sometimes incorrect because the extra information is independent of the video content. This paper proposes a player-centric multimodal prompt generation network for identity-aware sports video captioning (LLM-IAVC), which focuses on recognizing player identities from a visual perspective. Specifically, an identity-related information extraction module (IRIEM) is designed to extract player-related multimodal embeddings. IRIEM includes a player identification network (PIN) for extracting visual features and player names, and a bidirectional semantic interaction module (BSIM) to link player features with video content for mutual enhancement.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.