# Learning to Recognize 3D Human Action from A New Skeleton-based   Representation Using Deep Convolutional Neural Networks

**Authors:** Huy-Hieu Pham, Louahdi Khoudour, Alain Crouzil, Pablo Zegers, and Sergio A. Velastin

arXiv: 1812.10550 · 2018-12-31

## TL;DR

This paper introduces a novel skeleton-based 3D human action recognition method that transforms joint coordinates into RGB images for deep learning, achieving superior accuracy with less computation.

## Contribution

The paper proposes a new RGB image encoding of skeleton sequences and applies deep CNNs, improving recognition accuracy and efficiency over previous methods.

## Key findings

- Outperforms state-of-the-art on MSR Action3D and NTU-RGB+D datasets.
- Requires less computation for training and prediction.
- Effective representation of complex 3D motions.

## Abstract

Recognizing human actions in untrimmed videos is an important challenging task. An effective 3D motion representation and a powerful learning model are two key factors influencing recognition performance. In this paper we introduce a new skeleton-based representation for 3D action recognition in videos. The key idea of the proposed representation is to transform 3D joint coordinates of the human body carried in skeleton sequences into RGB images via a color encoding process. By normalizing the 3D joint coordinates and dividing each skeleton frame into five parts, where the joints are concatenated according to the order of their physical connections, the color-coded representation is able to represent spatio-temporal evolutions of complex 3D motions, independently of the length of each sequence. We then design and train different Deep Convolutional Neural Networks (D-CNNs) based on the Residual Network architecture (ResNet) on the obtained image-based representations to learn 3D motion features and classify them into classes. Our method is evaluated on two widely used action recognition benchmarks: MSR Action3D and NTU-RGB+D, a very large-scale dataset for 3D human action recognition. The experimental results demonstrate that the proposed method outperforms previous state-of-the-art approaches whilst requiring less computation for training and prediction.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1812.10550/full.md

## Figures

17 figures with captions in the complete paper: https://tomesphere.com/paper/1812.10550/full.md

## References

94 references — full list in the complete paper: https://tomesphere.com/paper/1812.10550/full.md

---
Source: https://tomesphere.com/paper/1812.10550