Temporal Context Network for Activity Localization in Videos
Xiyang Dai, Bharat Singh, Guyue Zhang, Larry S. Davis, Yan Qiu Chen

TL;DR
The paper introduces a Temporal Context Network (TCN) that enhances activity localization in videos by explicitly modeling context around proposals, leading to improved accuracy over existing methods.
Contribution
It proposes a novel context-aware representation for ranking activity proposals and a multi-scale sampling approach within a temporal convolutional framework.
Findings
Outperforms state-of-the-art on ActivityNet dataset
Outperforms state-of-the-art on THUMOS14 dataset
Effective in precise temporal activity localization
Abstract
We present a Temporal Context Network (TCN) for precise temporal localization of human activities. Similar to the Faster-RCNN architecture, proposals are placed at equal intervals in a video which span multiple temporal scales. We propose a novel representation for ranking these proposals. Since pooling features only inside a segment is not sufficient to predict activity boundaries, we construct a representation which explicitly captures context around a proposal for ranking it. For each temporal segment inside a proposal, features are uniformly sampled at a pair of scales and are input to a temporal convolutional neural network for classification. After ranking proposals, non-maximum suppression is applied and classification is performed to obtain final detections. TCN outperforms state-of-the-art methods on the ActivityNet dataset and the THUMOS14 dataset.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Video Surveillance and Tracking Methods
