Compositional Structure Learning for Action Understanding
Ran Xu, Gang Chen, Caiming Xiong, Wei Chen, Jason J. Corso

TL;DR
This paper introduces a compositional model for comprehensive action understanding that captures long-range, articulated human motion and outperforms existing methods in detection, localization, and recognition tasks.
Contribution
It proposes a novel mid-level representation called compositional trajectories and a structured deformable parts model for improved action understanding.
Findings
State-of-the-art performance on action detection
Effective separation of human and camera motion
Robust recognition across diverse actions
Abstract
The focus of the action understanding literature has predominately been classification, how- ever, there are many applications demanding richer action understanding such as mobile robotics and video search, with solutions to classification, localization and detection. In this paper, we propose a compositional model that leverages a new mid-level representation called compositional trajectories and a locally articulated spatiotemporal deformable parts model (LALSDPM) for fully action understanding. Our methods is advantageous in capturing the variable structure of dynamic human activity over a long range. First, the compositional trajectories capture long-ranging, frequently co-occurring groups of trajectories in space time and represent them in discriminative hierarchies, where human motion is largely separated from camera motion; second, LASTDPM learns a structured model with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Gait Recognition and Analysis
