Skeleton based action recognition using translation-scale invariant   image mapping and multi-scale deep cnn

Bo Li; Mingyi He; Xuelian Cheng; Yucheng Chen; Yuchao Dai

arXiv:1704.05645·cs.CV·June 14, 2017·51 cites

Skeleton based action recognition using translation-scale invariant image mapping and multi-scale deep cnn

Bo Li, Mingyi He, Xuelian Cheng, Yucheng Chen, Yuchao Dai

PDF

Open Access

TL;DR

This paper introduces a novel approach for skeleton-based action recognition by converting skeleton videos into invariant images and applying multi-scale deep CNNs, achieving state-of-the-art results on multiple datasets.

Contribution

It proposes a translation-scale invariant image mapping for skeleton videos and a multi-scale CNN architecture fine-tuned on pre-trained models, enhancing recognition accuracy.

Findings

01

Achieved state-of-the-art results on NTU RGB+D, UTD-MHAD, MSRC-12 datasets.

02

Method outperforms existing approaches by large margins on large datasets.

03

Effective on both 3D and 2D skeleton video data.

Abstract

This paper presents an image classification based approach for skeleton-based video action recognition problem. Firstly, A dataset independent translation-scale invariant image mapping method is proposed, which transformes the skeleton videos to colour images, named skeleton-images. Secondly, A multi-scale deep convolutional neural network (CNN) architecture is proposed which could be built and fine-tuned on the powerful pre-trained CNNs, e.g., AlexNet, VGGNet, ResNet etal.. Even though the skeleton-images are very different from natural images, the fine-tune strategy still works well. At last, we prove that our method could also work well on 2D skeleton video data. We achieve the state-of-the-art results on the popular benchmard datasets e.g. NTU RGB+D, UTD-MHAD, MSRC-12, and G3D. Especially on the largest and challenge NTU RGB+D, UTD-MHAD, and MSRC-12 dataset, our method outperforms…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Gait Recognition and Analysis · Hand Gesture Recognition Systems

MethodsAverage Pooling · *Communicated@Fast*How Do I Communicate to Expedia? · 1x1 Convolution · Batch Normalization · Bottleneck Residual Block · Global Average Pooling · Residual Block · Kaiming Initialization · Max Pooling · Residual Connection