ViTs for Action Classification in Videos: An Approach to Risky Tackle Detection in American Football Practice Videos
Syed Ahsan Masud Zaidi, William Hsu, and Scott Dietrich

TL;DR
This paper introduces a Vision transformer-based method for detecting risky tackles in American football videos, significantly improving recall on a larger dataset for injury prevention.
Contribution
The work expands the dataset for tackle risk detection and demonstrates that Vision transformers with imbalance-aware training improve safety-critical action detection.
Findings
Achieved risky recall of 0.67 and Risky F1 of 0.59.
Improved risky recall by over 8 percentage points compared to previous baseline.
Validated the effectiveness of Vision transformers in safety-critical video analysis.
Abstract
Early identification of hazardous actions in contact sports enables timely intervention and improves player safety. We present a method for detecting risky tackles in American football practice videos and introduce a substantially expanded dataset for this task. Our work contains 733 single-athlete-dummy tackle clips, each temporally localized around first point contact and labeled with a strike zone component of the standardized Assessment for Tackling Technique (SATT-3), extending prior work that reported 178 annotated videos. Using a Vision transformer-based model with imbalance-aware training, we obtain risky recall of 0.67 and Risky F1 of 0.59 under crossvalidation. Relative to the previous baseline in a smaller subset (risky recall of 0.58; Risky F1 0.56 ), our approach improves risky recall by more than 8% points on a much larger dataset. These results indicate that the vision…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
