A Real-time Action Representation with Temporal Encoding and Deep Compression
Kun Liu, Wu Liu, Huadong Ma, Mingkui Tan, Chuang Gan

TL;DR
This paper introduces T-C3D, a real-time 3D convolutional neural network for video action recognition that combines hierarchical temporal encoding and deep compression to achieve high accuracy and speed in practical applications.
Contribution
The paper presents a novel real-time convolutional architecture, T-C3D, integrating residual 3D CNN, temporal encoding, and deep compression for efficient and effective action recognition.
Findings
Achieves 5.4% higher accuracy on UCF101 benchmark.
Runs twice as fast as comparable real-time methods.
Model size is less than 5MB, enabling deployment on resource-limited devices.
Abstract
Deep neural networks have achieved remarkable success for video-based action recognition. However, most of existing approaches cannot be deployed in practice due to the high computational cost. To address this challenge, we propose a new real-time convolutional architecture, called Temporal Convolutional 3D Network (T-C3D), for action representation. T-C3D learns video action representations in a hierarchical multi-granularity manner while obtaining a high process speed. Specifically, we propose a residual 3D Convolutional Neural Network (CNN) to capture complementary information on the appearance of a single frame and the motion between consecutive frames. Based on this CNN, we develop a new temporal encoding method to explore the temporal dynamics of the whole video. Furthermore, we integrate deep compression techniques with T-C3D to further accelerate the deployment of models via…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Diabetic Foot Ulcer Assessment and Management · Gait Recognition and Analysis
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
