A Temporal Sequence Learning for Action Recognition and Prediction

Sangwoo Cho; Hassan Foroosh

arXiv:1906.06813·cs.CV·June 18, 2019

A Temporal Sequence Learning for Action Recognition and Prediction

Sangwoo Cho, Hassan Foroosh

PDF

TL;DR

This paper introduces a novel approach for human action recognition and prediction by representing videos as sequences of words and applying a Temporal CNN to learn their temporal order, achieving high accuracy with low latency.

Contribution

The work presents a new method that models videos as sentences of visual words and uses a Temporal CNN to effectively predict and recognize actions with partial video sequences.

Findings

01

Achieves 95% accuracy with half the video frames on UCF101 and HMDB51.

02

Demonstrates low-latency prediction capability.

03

Attains state-of-the-art performance at full sequence completion.

Abstract

In this work\footnote {This work was supported in part by the National Science Foundation under grant IIS-1212948.}, we present a method to represent a video with a sequence of words, and learn the temporal sequencing of such words as the key information for predicting and recognizing human actions. We leverage core concepts from the Natural Language Processing (NLP) literature used in sentence classification to solve the problems of action prediction and action recognition. Each frame is converted into a word that is represented as a vector using the Bag of Visual Words (BoW) encoding method. The words are then combined into a sentence to represent the video, as a sentence. The sequence of words in different actions are learned with a simple but effective Temporal Convolutional Neural Network (T-CNN) that captures the temporal sequencing of information in a video sentence. We…

Figures30

Click any figure to enlarge with its caption.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.