A Simple and Effective Temporal Grounding Pipeline for Basketball Broadcast Footage
Levi Harris

TL;DR
This paper introduces a straightforward, accurate, and scalable pipeline for temporally grounding basketball broadcast videos, facilitating large dataset creation for sports video analysis without complex localization steps.
Contribution
The proposed method simplifies temporal grounding by avoiding game clock localization, enabling fast, general, and scalable annotation of basketball videos for sports analytics.
Findings
Accurately extracts time-remaining and quarter info from broadcast footage.
Speeds up dataset creation for sports video models.
Supports deployment in large-scale computing environments.
Abstract
We present a reliable temporal grounding pipeline for video-to-analytic alignment of basketball broadcast footage. Given a series of frames as input, our method quickly and accurately extracts time-remaining and quarter values from basketball broadcast scenes. Our work intends to expedite the development of large, multi-modal video datasets to train data-hungry video models in the sports action recognition domain. Our method aligns a pre-labeled corpus of play-by-play annotations containing dense event annotations to video frames, enabling quick retrieval of labeled video segments. Unlike previous methods, we forgo the need to localize game clocks by fine-tuning an out-of-the-box object detector to find semantic text regions directly. Our end-to-end approach improves the generality of our work. Additionally, interpolation and parallelization techniques prepare our pipeline for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization
