Temporally Precise Action Spotting in Soccer Videos Using Dense Detection Anchors
Jo\~ao V. B. Soares, Avijit Shah, Topojoy Biswas

TL;DR
This paper introduces a dense detection anchor-based model for precise action spotting in soccer videos, leveraging large temporal contexts and advanced training techniques to improve localization accuracy.
Contribution
The paper proposes a novel dense detection anchor approach with two architectures and best practices, achieving state-of-the-art results in soccer action spotting.
Findings
State-of-the-art accuracy on SoccerNet-v2 dataset
Predicting temporal displacements improves localization
Training with SAM and mixup enhances performance
Abstract
We present a model for temporally precise action spotting in videos, which uses a dense set of detection anchors, predicting a detection confidence and corresponding fine-grained temporal displacement for each anchor. We experiment with two trunk architectures, both of which are able to incorporate large temporal contexts while preserving the smaller-scale features required for precise localization: a one-dimensional version of a u-net, and a Transformer encoder (TE). We also suggest best practices for training models of this kind, by applying Sharpness-Aware Minimization (SAM) and mixup data augmentation. We achieve a new state-of-the-art on SoccerNet-v2, the largest soccer video dataset of its kind, with marked improvements in temporal localization. Additionally, our ablations show: the importance of predicting the temporal displacements; the trade-offs between the u-net and TE…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Diabetic Foot Ulcer Assessment and Management
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Byte Pair Encoding · Dense Connections · Dropout · Absolute Position Encodings · Max Pooling · Concatenated Skip Connection · Position-Wise Feed-Forward Layer
