Cross-view Action Modeling, Learning and Recognition
Jiang wang, Xiaohan Nie, Yin Xia, Ying Wu, Song-Chun Zhu

TL;DR
This paper introduces MST-AOG, a hierarchical model for cross-view action recognition that leverages 3D skeleton data for training but recognizes actions from 2D videos from unseen views, improving accuracy and robustness.
Contribution
The paper presents a novel multiview spatio-temporal AND-OR graph model that effectively captures cross-view action variations and enables recognition from unseen views without requiring 3D data during testing.
Findings
Significant accuracy improvement in cross-view recognition
Robustness to view variations demonstrated
Creation of a new Multiview Action3D dataset
Abstract
Existing methods on video-based action recognition are generally view-dependent, i.e., performing recognition from the same views seen in the training data. We present a novel multiview spatio-temporal AND-OR graph (MST-AOG) representation for cross-view action recognition, i.e., the recognition is performed on the video from an unknown and unseen view. As a compositional model, MST-AOG compactly represents the hierarchical combinatorial structures of cross-view actions by explicitly modeling the geometry, appearance and motion variations. This paper proposes effective methods to learn the structure and parameters of MST-AOG. The inference based on MST-AOG enables action recognition from novel views. The training of MST-AOG takes advantage of the 3D human skeleton data obtained from Kinect cameras to avoid annotating enormous multi-view video frames, which is error-prone and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Gait Recognition and Analysis · Diabetic Foot Ulcer Assessment and Management
