TrajSV: A Trajectory-based Model for Sports Video Representations and Applications
Zheng Wang, Shihao Xu, Wei Shi

TL;DR
TrajSV introduces a trajectory-based framework for sports video analysis that effectively captures player and ball movements, enabling improved retrieval, action spotting, and captioning across multiple sports without requiring extensive supervision.
Contribution
The paper presents TrajSV, a novel trajectory-based model that leverages unsupervised learning and a Transformer architecture for comprehensive sports video representations.
Findings
Achieves nearly 70% improvement in sports video retrieval
Outperforms baselines in 9 out of 17 action categories for action spotting
Demonstrates nearly 20% improvement in video captioning
Abstract
Sports analytics has received significant attention from both academia and industry in recent years. Despite the growing interest and efforts in this field, several issues remain unresolved, including (1) data unavailability, (2) lack of an effective trajectory-based framework, and (3) requirement for sufficient supervision labels. In this paper, we present TrajSV, a trajectory-based framework that addresses various issues in existing studies. TrajSV comprises three components: data preprocessing, Clip Representation Network (CRNet), and Video Representation Network (VRNet). The data preprocessing module extracts player and ball trajectories from sports broadcast videos. CRNet utilizes a trajectory-enhanced Transformer module to learn clip representations based on these trajectories. Additionally, VRNet learns video representations by aggregating clip representations and visual features…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
