# Skeleton-based Action Recognition with Convolutional Neural Networks

**Authors:** Chao Li, Qiaoyong Zhong, Di Xie, Shiliang Pu

arXiv: 1704.07595 · 2017-05-03

## TL;DR

This paper introduces a CNN-based framework for skeleton-based action recognition and detection, utilizing a novel skeleton transformer module and achieving high accuracy on benchmark datasets.

## Contribution

It presents a new CNN framework with a skeleton transformer module for improved action recognition and detection, outperforming existing RNN-based methods.

## Key findings

- Achieved 89.3% accuracy on NTU RGB+D dataset.
- Attained 93.7% mAP on PKU-MMD dataset.
- Outperformed baseline methods significantly.

## Abstract

Current state-of-the-art approaches to skeleton-based action recognition are mostly based on recurrent neural networks (RNN). In this paper, we propose a novel convolutional neural networks (CNN) based framework for both action classification and detection. Raw skeleton coordinates as well as skeleton motion are fed directly into CNN for label prediction. A novel skeleton transformer module is designed to rearrange and select important skeleton joints automatically. With a simple 7-layer network, we obtain 89.3% accuracy on validation set of the NTU RGB+D dataset. For action detection in untrimmed videos, we develop a window proposal network to extract temporal segment proposals, which are further classified within the same network. On the recent PKU-MMD dataset, we achieve 93.7% mAP, surpassing the baseline by a large margin.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1704.07595/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/1704.07595/full.md

## References

12 references — full list in the complete paper: https://tomesphere.com/paper/1704.07595/full.md

---
Source: https://tomesphere.com/paper/1704.07595