ViTs for Action Classification in Videos: An Approach to Risky Tackle Detection in American Football Practice Videos

Syed Ahsan Masud Zaidi; William Hsu; and Scott Dietrich

arXiv:2604.01318·cs.CV·April 3, 2026

ViTs for Action Classification in Videos: An Approach to Risky Tackle Detection in American Football Practice Videos

Syed Ahsan Masud Zaidi, William Hsu, and Scott Dietrich

PDF

TL;DR

This paper introduces a Vision transformer-based method for detecting risky tackles in American football videos, significantly improving recall on a larger dataset for injury prevention.

Contribution

The work expands the dataset for tackle risk detection and demonstrates that Vision transformers with imbalance-aware training improve safety-critical action detection.

Findings

01

Achieved risky recall of 0.67 and Risky F1 of 0.59.

02

Improved risky recall by over 8 percentage points compared to previous baseline.

03

Validated the effectiveness of Vision transformers in safety-critical video analysis.

Abstract

Early identification of hazardous actions in contact sports enables timely intervention and improves player safety. We present a method for detecting risky tackles in American football practice videos and introduce a substantially expanded dataset for this task. Our work contains 733 single-athlete-dummy tackle clips, each temporally localized around first point contact and labeled with a strike zone component of the standardized Assessment for Tackling Technique (SATT-3), extending prior work that reported 178 annotated videos. Using a Vision transformer-based model with imbalance-aware training, we obtain risky recall of 0.67 and Risky F1 of 0.59 under crossvalidation. Relative to the previous baseline in a smaller subset (risky recall of 0.58; Risky F1 0.56 ), our approach improves risky recall by more than 8% points on a much larger dataset. These results indicate that the vision…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.