Learning Latent Spatio-Temporal Compositional Model for Human Action   Recognition

Xiaodan Liang; Liang Lin; Liangliang Cao

arXiv:1502.00258·cs.CV·February 3, 2015

Learning Latent Spatio-Temporal Compositional Model for Human Action Recognition

Xiaodan Liang, Liang Lin, Liangliang Cao

PDF

TL;DR

This paper introduces a novel spatio-temporal compositional model called STAOG for human action recognition, capturing complex action structures and interactions in videos, and employs a weakly supervised learning algorithm for training.

Contribution

The paper presents a new hierarchical spatio-temporal model with a weakly supervised learning approach for improved action recognition accuracy.

Findings

01

Outperforms existing methods on challenging datasets.

02

Effectively handles large intra-class variance.

03

Models complex spatio-temporal interactions.

Abstract

Action recognition is an important problem in multimedia understanding. This paper addresses this problem by building an expressive compositional action model. We model one action instance in the video with an ensemble of spatio-temporal compositions: a number of discrete temporal anchor frames, each of which is further decomposed to a layout of deformable parts. In this way, our model can identify a Spatio-Temporal And-Or Graph (STAOG) to represent the latent structure of actions e.g. triple jumping, swinging and high jumping. The STAOG model comprises four layers: (i) a batch of leaf-nodes in bottom for detecting various action parts within video patches; (ii) the or-nodes over bottom, i.e. switch variables to activate their children leaf-nodes for structural variability; (iii) the and-nodes within an anchor frame for verifying spatial composition; and (iv) the root-node at top for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.