Action Recognition Based on Joint Trajectory Maps Using Convolutional Neural Networks
Pichao Wang, Zhaoyang Li, Yonghong Hou, Wanqing Li

TL;DR
This paper introduces a novel method to encode 3D skeleton sequences into 2D images called Joint Trajectory Maps, enabling ConvNets to perform real-time human action recognition with state-of-the-art accuracy.
Contribution
It proposes a simple, effective encoding of spatio-temporal skeleton data into 2D images for ConvNets, advancing video-based action recognition.
Findings
Achieved state-of-the-art results on three benchmark datasets.
Demonstrated effectiveness of JTMs for real-time action recognition.
Validated the method's robustness across multiple datasets.
Abstract
Recently, Convolutional Neural Networks (ConvNets) have shown promising performances in many computer vision tasks, especially image-based recognition. How to effectively use ConvNets for video-based recognition is still an open problem. In this paper, we propose a compact, effective yet simple method to encode spatio-temporal information carried in skeleton sequences into multiple images, referred to as Joint Trajectory Maps (JTM), and ConvNets are adopted to exploit the discriminative features for real-time human action recognition. The proposed method has been evaluated on three public benchmarks, i.e., MSRC-12 Kinect gesture dataset (MSRC-12), G3D dataset and UTD multimodal human action dataset (UTD-MHAD) and achieved the state-of-the-art results.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Gait Recognition and Analysis · Hand Gesture Recognition Systems
