TL;DR
This paper introduces NetVLAD++, a temporally-aware feature pooling method for action spotting in soccer broadcasts, which improves localization accuracy by separately modeling past and future context, achieving state-of-the-art results.
Contribution
We propose NetVLAD++, a novel pooling technique that disentangles past and future context for better action localization in sports videos.
Findings
Achieved 53.4% Average-mAP on SoccerNet-v2, outperforming previous methods.
Disentangling temporal context improves feature discrimination.
Temporal-aware pooling enhances understanding of game actions.
Abstract
Toward the goal of automatic production for sports broadcasts, a paramount task consists in understanding the high-level semantic information of the game in play. For instance, recognizing and localizing the main actions of the game would allow producers to adapt and automatize the broadcast production, focusing on the important details of the game and maximizing the spectator engagement. In this paper, we focus our analysis on action spotting in soccer broadcast, which consists in temporally localizing the main actions in a soccer game. To that end, we propose a novel feature pooling method based on NetVLAD, dubbed NetVLAD++, that embeds temporally-aware knowledge. Different from previous pooling methods that consider the temporal context as a single set to pool from, we split the context before and after an action occurs. We argue that considering the contextual information around the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
