BoxingVI: A Multi-Modal Benchmark for Boxing Action Recognition and Localization
Rahul Kumar, Vipul Baghel, Sudhanshu Singh, Bikash Kumar Badatya, Shivam Yadav, Babji Srinivasan, Ravi Hegde

TL;DR
BoxingVI introduces a comprehensive, annotated video dataset of boxing punches designed to advance real-time action recognition, classification, and localization in unstructured, low-resource environments.
Contribution
The paper presents a new, large-scale boxing punch dataset with detailed annotations, supporting research in combat sports analysis and automated coaching.
Findings
Dataset includes 6,915 punch clips across six punch types.
Captures diverse motion styles, angles, and athlete physiques.
Supports development of robust action recognition models.
Abstract
Accurate analysis of combat sports using computer vision has gained traction in recent years, yet the development of robust datasets remains a major bottleneck due to the dynamic, unstructured nature of actions and variations in recording environments. In this work, we present a comprehensive, well-annotated video dataset tailored for punch detection and classification in boxing. The dataset comprises 6,915 high-quality punch clips categorized into six distinct punch types, extracted from 20 publicly available YouTube sparring sessions and involving 18 different athletes. Each clip is manually segmented and labeled to ensure precise temporal boundaries and class consistency, capturing a wide range of motion styles, camera angles, and athlete physiques. This dataset is specifically curated to support research in real-time vision-based action recognition, especially in low-resource and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Sports Performance and Training · Video Analysis and Summarization
