SRF-Net: Selective Receptive Field Network for Anchor-Free Temporal Action Detection
Ranyu Ning, Can Zhang, Yuexian Zou

TL;DR
SRF-Net is an anchor-free model for temporal action detection that adaptively adjusts receptive fields to better localize actions in untrimmed videos, improving over existing anchor-based methods.
Contribution
The paper introduces SRF-Net, a novel anchor-free TAD model with adaptive receptive fields, trained end-to-end for improved generalization in action localization.
Findings
Outperforms state-of-the-art on THUMOS14 dataset
Effectively adapts receptive fields to action scale variations
Eliminates need for pre-defined anchors in TAD
Abstract
Temporal action detection (TAD) is a challenging task which aims to temporally localize and recognize the human action in untrimmed videos. Current mainstream one-stage TAD approaches localize and classify action proposals relying on pre-defined anchors, where the location and scale for action instances are set by designers. Obviously, such an anchor-based TAD method limits its generalization capability and will lead to performance degradation when videos contain rich action variation. In this study, we explore to remove the requirement of pre-defined anchors for TAD methods. A novel TAD model termed as Selective Receptive Field Network (SRF-Net) is developed, in which the location offsets and classification scores at each temporal location can be directly estimated in the feature map and SRF-Net is trained in an end-to-end manner. Innovatively, a building block called Selective…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Multimodal Machine Learning Applications
MethodsConvolution
