Action Machine: Rethinking Action Recognition in Trimmed Videos
Jiagang Zhu, Wei Zou, Liang Xu, Yiming Hu, Zheng Zhu, Manyu Chang,, Junjie Huang, Guan Huang, Dalong Du

TL;DR
This paper introduces Action Machine, a person-centric framework for action recognition in trimmed videos that combines pose estimation and RGB data, achieving state-of-the-art results.
Contribution
The paper proposes a novel, simple framework that extends I3D with pose estimation and fusion techniques for improved action recognition.
Findings
Achieves 97.2% top-1 accuracy on NTU RGB-D cross-view
Outperforms previous methods on multiple datasets
Fast training and testing due to multi-task learning
Abstract
Existing methods in video action recognition mostly do not distinguish human body from the environment and easily overfit the scenes and objects. In this work, we present a conceptually simple, general and high-performance framework for action recognition in trimmed videos, aiming at person-centric modeling. The method, called Action Machine, takes as inputs the videos cropped by person bounding boxes. It extends the Inflated 3D ConvNet (I3D) by adding a branch for human pose estimation and a 2D CNN for pose-based action recognition, being fast to train and test. Action Machine can benefit from the multi-task training of action recognition and pose estimation, the fusion of predictions from RGB images and poses. On NTU RGB-D, Action Machine achieves the state-of-the-art performance with top-1 accuracies of 97.2% and 94.3% on cross-view and cross-subject respectively. Action Machine also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Hand Gesture Recognition Systems · Diabetic Foot Ulcer Assessment and Management
