TL;DR
This paper introduces RMS-Net, a lightweight modular network for soccer event spotting that predicts event timing and labels, improving accuracy through novel training strategies and achieving state-of-the-art results on SoccerNet.
Contribution
The paper presents a new network architecture with training techniques for better event detection in soccer videos, outperforming existing methods.
Findings
Exceeds state-of-the-art by 3 Average-mAP points on SoccerNet.
Achieves over 10 Average-mAP points improvement with a 2D backbone.
Effective data balancing and masking strategies enhance detection performance.
Abstract
The recently proposed action spotting task consists in finding the exact timestamp in which an event occurs. This task fits particularly well for soccer videos, where events correspond to salient actions strictly defined by soccer rules (a goal occurs when the ball crosses the goal line). In this paper, we devise a lightweight and modular network for action spotting, which can simultaneously predict the event label and its temporal offset using the same underlying features. We enrich our model with two training strategies: the first one for data balancing and uniform sampling, the second for masking ambiguous frames and keeping the most discriminative visual cues. When tested on the SoccerNet dataset and using standard features, our full proposal exceeds the current state of the art by 3 Average-mAP points. Additionally, it reaches a gain of more than 10 Average-mAP points on the test…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
