A Real-time Action Representation with Temporal Encoding and Deep   Compression

Kun Liu; Wu Liu; Huadong Ma; Mingkui Tan; Chuang Gan

arXiv:2006.09675·cs.CV·June 18, 2020

A Real-time Action Representation with Temporal Encoding and Deep Compression

Kun Liu, Wu Liu, Huadong Ma, Mingkui Tan, Chuang Gan

PDF

Open Access

TL;DR

This paper introduces T-C3D, a real-time 3D convolutional neural network for video action recognition that combines hierarchical temporal encoding and deep compression to achieve high accuracy and speed in practical applications.

Contribution

The paper presents a novel real-time convolutional architecture, T-C3D, integrating residual 3D CNN, temporal encoding, and deep compression for efficient and effective action recognition.

Findings

01

Achieves 5.4% higher accuracy on UCF101 benchmark.

02

Runs twice as fast as comparable real-time methods.

03

Model size is less than 5MB, enabling deployment on resource-limited devices.

Abstract

Deep neural networks have achieved remarkable success for video-based action recognition. However, most of existing approaches cannot be deployed in practice due to the high computational cost. To address this challenge, we propose a new real-time convolutional architecture, called Temporal Convolutional 3D Network (T-C3D), for action representation. T-C3D learns video action representations in a hierarchical multi-granularity manner while obtaining a high process speed. Specifically, we propose a residual 3D Convolutional Neural Network (CNN) to capture complementary information on the appearance of a single frame and the motion between consecutive frames. Based on this CNN, we develop a new temporal encoding method to explore the temporal dynamics of the whole video. Furthermore, we integrate deep compression techniques with T-C3D to further accelerate the deployment of models via…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Diabetic Foot Ulcer Assessment and Management · Gait Recognition and Analysis

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings