RacketVision: A Multiple Racket Sports Benchmark for Unified Ball and Racket Analysis
Linfeng Dong, Yuchen Yang, Hao Wu, Wei Wang, Yuenan Hou, Zhihang Zhong, Xiao Sun

TL;DR
RacketVision introduces a comprehensive dataset and benchmark for multi-task sports analytics across racket sports, emphasizing the importance of multimodal fusion techniques like CrossAttention for improved trajectory prediction.
Contribution
It provides the first large-scale, fine-grained dataset with racket pose annotations and demonstrates the effectiveness of CrossAttention in multimodal ball and racket analysis.
Findings
Naive feature concatenation degrades performance.
CrossAttention significantly improves trajectory prediction.
Benchmark facilitates future research in sports analytics.
Abstract
We introduce RacketVision, a novel dataset and benchmark for advancing computer vision in sports analytics, covering table tennis, tennis, and badminton. The dataset is the first to provide large-scale, fine-grained annotations for racket pose alongside traditional ball positions, enabling research into complex human-object interactions. It is designed to tackle three interconnected tasks: fine-grained ball tracking, articulated racket pose estimation, and predictive ball trajectory forecasting. Our evaluation of established baselines reveals a critical insight for multi-modal fusion: while naively concatenating racket pose features degrades performance, a CrossAttention mechanism is essential to unlock their value, leading to trajectory prediction results that surpass strong unimodal baselines. RacketVision provides a versatile resource and a strong starting point for future research…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsRobot Manipulation and Learning · Human Pose and Action Recognition · Sports Dynamics and Biomechanics
