SRF-Net: Selective Receptive Field Network for Anchor-Free Temporal   Action Detection

Ranyu Ning; Can Zhang; Yuexian Zou

arXiv:2106.15258·cs.CV·June 30, 2021

SRF-Net: Selective Receptive Field Network for Anchor-Free Temporal Action Detection

Ranyu Ning, Can Zhang, Yuexian Zou

PDF

Open Access

TL;DR

SRF-Net is an anchor-free model for temporal action detection that adaptively adjusts receptive fields to better localize actions in untrimmed videos, improving over existing anchor-based methods.

Contribution

The paper introduces SRF-Net, a novel anchor-free TAD model with adaptive receptive fields, trained end-to-end for improved generalization in action localization.

Findings

01

Outperforms state-of-the-art on THUMOS14 dataset

02

Effectively adapts receptive fields to action scale variations

03

Eliminates need for pre-defined anchors in TAD

Abstract

Temporal action detection (TAD) is a challenging task which aims to temporally localize and recognize the human action in untrimmed videos. Current mainstream one-stage TAD approaches localize and classify action proposals relying on pre-defined anchors, where the location and scale for action instances are set by designers. Obviously, such an anchor-based TAD method limits its generalization capability and will lead to performance degradation when videos contain rich action variation. In this study, we explore to remove the requirement of pre-defined anchors for TAD methods. A novel TAD model termed as Selective Receptive Field Network (SRF-Net) is developed, in which the location offsets and classification scores at each temporal location can be directly estimated in the feature map and SRF-Net is trained in an end-to-end manner. Innovatively, a building block called Selective…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Multimodal Machine Learning Applications

MethodsConvolution