SportSkills: Physical Skill Learning from Sports Instructional Videos
Kumar Ashutosh, Chi Hsuan Wu, Kristen Grauman

TL;DR
SportSkills introduces a large-scale sports video dataset focused on physical skill learning, enabling improved understanding of fine-grained actions and personalized instructional retrieval, with significant performance gains over traditional datasets.
Contribution
We present SportSkills, the first large-scale sports dataset for physical skill learning, and develop a novel mistake-conditioned video retrieval task for personalized coaching.
Findings
Representation gains of up to 4x on physical skill understanding
Effective mistake-conditioned instructional video retrieval
Significant improvement in personalized coaching applications
Abstract
Current large-scale video datasets focus on general human activity, but lack depth of coverage on fine-grained activities needed to address physical skill learning. We introduce SportSkills, the first large-scale sports dataset geared towards physical skill learning with in-the-wild video. SportSkills has more than 360k instructional videos containing more than 630k visual demonstrations paired with instructional narrations explaining the know-how behind the actions from 55 varied sports. Through a suite of experiments, we show that SportSkills unlocks the ability to understand fine-grained differences between physical actions. Our representation achieves gains of up to 4x with the same model trained on traditional activity-centric datasets. Crucially, building on SportSkills, we introduce the first large-scale task formulation of mistake-conditioned instructional video retrieval,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Video Analysis and Summarization
