Sequential Deep Trajectory Descriptor for Action Recognition with   Three-stream CNN

Yemin Shi; Yonghong Tian; Yaowei Wang; Tiejun Huang

arXiv:1609.03056·cs.CV·February 13, 2017

Sequential Deep Trajectory Descriptor for Action Recognition with Three-stream CNN

Yemin Shi, Yonghong Tian, Yaowei Wang, Tiejun Huang

PDF

TL;DR

This paper introduces a novel long-term motion descriptor called sDTD, integrated into a three-stream CNN framework, significantly improving action recognition accuracy by effectively capturing static, short-term, and long-term motion features.

Contribution

The paper proposes the sequential Deep Trajectory Descriptor (sDTD) for long-term motion representation and integrates it into a three-stream CNN framework for enhanced action recognition.

Findings

01

Achieves state-of-the-art results on KTH and UCF101 datasets.

02

Performs comparably to top methods on HMDB51 dataset.

03

Effectively captures static, short-term, and long-term motion features.

Abstract

Learning the spatial-temporal representation of motion information is crucial to human action recognition. Nevertheless, most of the existing features or descriptors cannot capture motion information effectively, especially for long-term motion. To address this problem, this paper proposes a long-term motion descriptor called sequential Deep Trajectory Descriptor (sDTD). Specifically, we project dense trajectories into two-dimensional planes, and subsequently a CNN-RNN network is employed to learn an effective representation for long-term motion. Unlike the popular two-stream ConvNets, the sDTD stream is introduced into a three-stream framework so as to identify actions from a video sequence. Consequently, this three-stream framework can simultaneously capture static spatial features, short-term motion and long-term motion in the video. Extensive experiments were conducted on three…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.