Player-Centric Multimodal Prompt Generation for Large Language Model Based Identity-Aware Basketball Video Captioning
Zeyu Xi, Haoying Sun, Yaofei Wu, Junchi Yan, Haoran Zhang, Lifang Wu, Liang Wang, Changwen Chen

TL;DR
This paper introduces a player-centric multimodal prompt generation network for basketball video captioning that accurately recognizes player identities and improves description quality, supported by a new large-scale NBA-Identity dataset.
Contribution
The paper presents a novel multimodal prompt generation approach focusing on player identity recognition, integrating visual and semantic features for improved sports video captioning.
Findings
Achieves state-of-the-art performance on NBA-Identity and VC-NBA-2022 datasets.
Constructs a large-scale NBA-Identity dataset with 9,726 videos.
Demonstrates effective integration of multimodal features for identity-aware captioning.
Abstract
Existing sports video captioning methods often focus on the action yet overlook player identities, limiting their applicability. Although some methods integrate extra information to generate identity-aware descriptions, the player identities are sometimes incorrect because the extra information is independent of the video content. This paper proposes a player-centric multimodal prompt generation network for identity-aware sports video captioning (LLM-IAVC), which focuses on recognizing player identities from a visual perspective. Specifically, an identity-related information extraction module (IRIEM) is designed to extract player-related multimodal embeddings. IRIEM includes a player identification network (PIN) for extracting visual features and player names, and a bidirectional semantic interaction module (BSIM) to link player features with video content for mutual enhancement.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
