CDC: Convolutional-De-Convolutional Networks for Precise Temporal Action   Localization in Untrimmed Videos

Zheng Shou; Jonathan Chan; Alireza Zareian; Kazuyuki Miyazawa; Shih-Fu; Chang

arXiv:1703.01515·cs.CV·June 14, 2017·62 cites

CDC: Convolutional-De-Convolutional Networks for Precise Temporal Action Localization in Untrimmed Videos

Zheng Shou, Jonathan Chan, Alireza Zareian, Kazuyuki Miyazawa, Shih-Fu, Chang

PDF

Open Access 1 Repo

TL;DR

This paper introduces CDC, a novel neural network that combines convolutional and de-convolutional layers to achieve precise, frame-level temporal action localization in untrimmed videos with high efficiency.

Contribution

The paper proposes the CDC network that performs dense, frame-level predictions for action localization, improving boundary precision and processing speed over existing segment-level methods.

Findings

01

Achieves superior frame-level action detection accuracy.

02

Significantly improves temporal boundary localization precision.

03

Processes 500 frames per second on a single GPU.

Abstract

Temporal action localization is an important yet challenging problem. Given a long, untrimmed video consisting of multiple action instances and complex background contents, we need not only to recognize their action categories, but also to localize the start time and end time of each instance. Many state-of-the-art systems use segment-level classifiers to select and rank proposal segments of pre-determined boundaries. However, a desirable model should move beyond segment-level and make dense predictions at a fine granularity in time to determine precise temporal boundaries. To this end, we design a novel Convolutional-De-Convolutional (CDC) network that places CDC filters on top of 3D ConvNets, which have been shown to be effective for abstracting action semantics but reduce the temporal length of the input data. The proposed CDC filter performs the required temporal upsampling and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://bitbucket.org/columbiadvmm/cdc
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Video Surveillance and Tracking Methods