Recognition of Visually Perceived Compositional Human Actions by   Multiple Spatio-Temporal Scales Recurrent Neural Networks

Haanvid Lee; Minju Jung; and Jun Tani

arXiv:1602.01921·cs.CV·February 23, 2017

Recognition of Visually Perceived Compositional Human Actions by Multiple Spatio-Temporal Scales Recurrent Neural Networks

Haanvid Lee, Minju Jung, and Jun Tani

PDF

TL;DR

This paper introduces a novel neural network model, MSTRNN, that recognizes human actions by integrating multiple spatio-temporal scales and hierarchical constraints, improving understanding of compositional actions.

Contribution

The paper presents the MSTRNN model, combining multiple timescale recurrent dynamics with convolutional networks, to better capture hierarchical and compositional structures in human action recognition.

Findings

01

MSTRNN outperforms other deep models on action datasets.

02

Internal representations reveal development of functional hierarchies.

03

Model effectively captures compositionality in human actions.

Abstract

The current paper proposes a novel neural network model for recognizing visually perceived human actions. The proposed multiple spatio-temporal scales recurrent neural network (MSTRNN) model is derived by introducing multiple timescale recurrent dynamics to the conventional convolutional neural network model. One of the essential characteristics of the MSTRNN is that its architecture imposes both spatial and temporal constraints simultaneously on the neural activity which vary in multiple scales among different layers. As suggested by the principle of the upward and downward causation, it is assumed that the network can develop meaningful structures such as functional hierarchy by taking advantage of such constraints during the course of learning. To evaluate the characteristics of the model, the current study uses three types of human action video dataset consisting of different types…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.