Temporal Context Network for Activity Localization in Videos

Xiyang Dai; Bharat Singh; Guyue Zhang; Larry S. Davis; Yan Qiu Chen

arXiv:1708.02349·cs.CV·August 9, 2017·66 cites

Temporal Context Network for Activity Localization in Videos

Xiyang Dai, Bharat Singh, Guyue Zhang, Larry S. Davis, Yan Qiu Chen

PDF

Open Access

TL;DR

The paper introduces a Temporal Context Network (TCN) that enhances activity localization in videos by explicitly modeling context around proposals, leading to improved accuracy over existing methods.

Contribution

It proposes a novel context-aware representation for ranking activity proposals and a multi-scale sampling approach within a temporal convolutional framework.

Findings

01

Outperforms state-of-the-art on ActivityNet dataset

02

Outperforms state-of-the-art on THUMOS14 dataset

03

Effective in precise temporal activity localization

Abstract

We present a Temporal Context Network (TCN) for precise temporal localization of human activities. Similar to the Faster-RCNN architecture, proposals are placed at equal intervals in a video which span multiple temporal scales. We propose a novel representation for ranking these proposals. Since pooling features only inside a segment is not sufficient to predict activity boundaries, we construct a representation which explicitly captures context around a proposal for ranking it. For each temporal segment inside a proposal, features are uniformly sampled at a pair of scales and are input to a temporal convolutional neural network for classification. After ranking proposals, non-maximum suppression is applied and classification is performed to obtain final detections. TCN outperforms state-of-the-art methods on the ActivityNet dataset and the THUMOS14 dataset.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Video Surveillance and Tracking Methods