Rethinking the Faster R-CNN Architecture for Temporal Action   Localization

Yu-Wei Chao; Sudheendra Vijayanarasimhan; Bryan Seybold and; David A. Ross; Jia Deng; Rahul Sukthankar

arXiv:1804.07667·cs.CV·April 23, 2018·33 cites

Rethinking the Faster R-CNN Architecture for Temporal Action Localization

Yu-Wei Chao, Sudheendra Vijayanarasimhan, Bryan Seybold and, David A. Ross, Jia Deng, Rahul Sukthankar

PDF

Open Access

TL;DR

TAL-Net is a novel temporal action localization framework that enhances receptive field alignment, exploits temporal context, and emphasizes multi-stream feature fusion, achieving state-of-the-art results on key benchmarks.

Contribution

It introduces TAL-Net, which improves upon Faster R-CNN for video action localization by addressing receptive field, context exploitation, and feature fusion.

Findings

01

Achieves state-of-the-art on THUMOS'14

02

Competitive results on ActivityNet

03

Demonstrates importance of late motion fusion

Abstract

We propose TAL-Net, an improved approach to temporal action localization in video that is inspired by the Faster R-CNN object detection framework. TAL-Net addresses three key shortcomings of existing approaches: (1) we improve receptive field alignment using a multi-scale architecture that can accommodate extreme variation in action durations; (2) we better exploit the temporal context of actions for both proposal generation and action classification by appropriately extending receptive fields; and (3) we explicitly consider multi-stream feature fusion and demonstrate that fusing motion late is important. We achieve state-of-the-art performance for both action proposal and localization on THUMOS'14 detection benchmark and competitive performance on ActivityNet challenge.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Anomaly Detection Techniques and Applications

MethodsRegion Proposal Network · Softmax · Convolution · RoIPool · Faster R-CNN