Skeleton based action recognition using translation-scale invariant image mapping and multi-scale deep cnn
Bo Li, Mingyi He, Xuelian Cheng, Yucheng Chen, Yuchao Dai

TL;DR
This paper introduces a novel approach for skeleton-based action recognition by converting skeleton videos into invariant images and applying multi-scale deep CNNs, achieving state-of-the-art results on multiple datasets.
Contribution
It proposes a translation-scale invariant image mapping for skeleton videos and a multi-scale CNN architecture fine-tuned on pre-trained models, enhancing recognition accuracy.
Findings
Achieved state-of-the-art results on NTU RGB+D, UTD-MHAD, MSRC-12 datasets.
Method outperforms existing approaches by large margins on large datasets.
Effective on both 3D and 2D skeleton video data.
Abstract
This paper presents an image classification based approach for skeleton-based video action recognition problem. Firstly, A dataset independent translation-scale invariant image mapping method is proposed, which transformes the skeleton videos to colour images, named skeleton-images. Secondly, A multi-scale deep convolutional neural network (CNN) architecture is proposed which could be built and fine-tuned on the powerful pre-trained CNNs, e.g., AlexNet, VGGNet, ResNet etal.. Even though the skeleton-images are very different from natural images, the fine-tune strategy still works well. At last, we prove that our method could also work well on 2D skeleton video data. We achieve the state-of-the-art results on the popular benchmard datasets e.g. NTU RGB+D, UTD-MHAD, MSRC-12, and G3D. Especially on the largest and challenge NTU RGB+D, UTD-MHAD, and MSRC-12 dataset, our method outperforms…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Gait Recognition and Analysis · Hand Gesture Recognition Systems
MethodsAverage Pooling · *Communicated@Fast*How Do I Communicate to Expedia? · 1x1 Convolution · Batch Normalization · Bottleneck Residual Block · Global Average Pooling · Residual Block · Kaiming Initialization · Max Pooling · Residual Connection
