From Skeletons to Pixels: Few-Shot Precise Event Spotting via Representation and Prediction Distillation
Zhong Han Ervin Yeoh, and Jiang Kan

TL;DR
This paper introduces two multimodal distillation strategies, AWD and AMD-FED, to improve few-shot precise event spotting in sports videos, demonstrating superior performance on tennis and figure skating datasets.
Contribution
The paper proposes novel adaptive prediction and representation distillation methods that enhance few-shot event spotting accuracy using multimodal knowledge transfer.
Findings
Both methods outperform single-modality baselines and prior approaches.
Representation-level distillation shows stronger performance in tennis.
AMD-FED generalizes well to figure skating dataset.
Abstract
Precise Event Spotting (PES) is essential in fast-paced sports such as tennis, where fine-grained events occur within very short temporal windows. Accurate frame-level localization is challenging because of motion blur, subtle action differences, and limited annotated data. We study two complementary distillation strategies for few-shot PES: Adaptive Weight Distillation (AWD), a prediction-level method that adaptively weights teacher supervision on unlabeled data, and Annealed Multimodal Distillation for Few-Shot Event Detection (AMD-FED), a representation-level framework that transfers robust skeleton knowledge into visual modalities through annealed pseudo-labeling. Both methods use multimodal distillation to improve generalization under limited supervision. We evaluate them on F3Set-Tennis(sub) under few-shot k-clip settings, where they consistently outperform single-modality…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
