Long-term Temporal Convolutions for Action Recognition

G\"ul Varol; Ivan Laptev; Cordelia Schmid

arXiv:1604.04494·cs.CV·June 5, 2017·137 cites

Long-term Temporal Convolutions for Action Recognition

G\"ul Varol, Ivan Laptev, Cordelia Schmid

PDF

Open Access 1 Repo

TL;DR

This paper introduces long-term temporal convolutions in neural networks to better capture full-duration actions in videos, significantly improving recognition accuracy on benchmark datasets.

Contribution

It proposes LTC-CNN models with extended temporal receptive fields and highlights the importance of high-quality optical flow for action recognition.

Findings

01

Achieved state-of-the-art accuracy on UCF101 (92.7%)

02

Achieved state-of-the-art accuracy on HMDB51 (67.2%)

03

Long-term convolutions improve action recognition performance

Abstract

Typical human actions last several seconds and exhibit characteristic spatio-temporal structure. Recent methods attempt to capture this structure and learn action representations with convolutional neural networks. Such representations, however, are typically learned at the level of a few video frames failing to model actions at their full temporal extent. In this work we learn video representations using neural networks with long-term temporal convolutions (LTC). We demonstrate that LTC-CNN models with increased temporal extents improve the accuracy of action recognition. We also study the impact of different low-level representations, such as raw values of video pixels and optical flow vector fields and demonstrate the importance of high-quality optical flow estimation for learning accurate action models. We report state-of-the-art results on two challenging benchmarks for human…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gulvarol/ltc
torch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Multimodal Machine Learning Applications