COMEDIAN: Self-Supervised Learning and Knowledge Distillation for Action Spotting using Transformers
Julien Denize, Mykola Liashuha, Jaonary Rabarisoa, Astrid Orcesi,, Romain H\'erault

TL;DR
COMEDIAN introduces a self-supervised and knowledge distillation-based pretraining pipeline for spatiotemporal transformers, significantly improving action spotting accuracy and convergence speed on SoccerNet-v2.
Contribution
It proposes a novel three-step pretraining pipeline combining self-supervised learning and knowledge distillation for action spotting with transformers.
Findings
Achieves state-of-the-art results on SoccerNet-v2.
Pretraining improves performance over non-pretrained models.
Faster convergence during training.
Abstract
We present COMEDIAN, a novel pipeline to initialize spatiotemporal transformers for action spotting, which involves self-supervised learning and knowledge distillation. Action spotting is a timestamp-level temporal action detection task. Our pipeline consists of three steps, with two initialization stages. First, we perform self-supervised initialization of a spatial transformer using short videos as input. Additionally, we initialize a temporal transformer that enhances the spatial transformer's outputs with global context through knowledge distillation from a pre-computed feature bank aligned with each short video segment. In the final step, we fine-tune the transformers to the action spotting task. The experiments, conducted on the SoccerNet-v2 dataset, demonstrate state-of-the-art performance and validate the effectiveness of COMEDIAN's pretraining paradigm. Our results highlight…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Anomaly Detection Techniques and Applications
MethodsKnowledge Distillation · Spatial Transformer
