# Spatial-temporal Transformer-guided Diffusion based Data Augmentation   for Efficient Skeleton-based Action Recognition

**Authors:** Yifan Jiang, Han Chen, Hanseok Ko

arXiv: 2302.13434 · 2023-07-26

## TL;DR

This paper presents a novel data augmentation approach for skeleton-based action recognition using diffusion models guided by a spatial-temporal transformer, significantly improving model performance with synthetic data.

## Contribution

It introduces a new diffusion-based data augmentation method guided by a spatial-temporal transformer for generating realistic skeleton action sequences.

## Key findings

- Outperforms state-of-the-art motion generation methods
- Synthetic data improves action recognition accuracy
- Generates diverse and natural action sequences

## Abstract

Recently, skeleton-based human action has become a hot research topic because the compact representation of human skeletons brings new blood to this research domain. As a result, researchers began to notice the importance of using RGB or other sensors to analyze human action by extracting skeleton information. Leveraging the rapid development of deep learning (DL), a significant number of skeleton-based human action approaches have been presented with fine-designed DL structures recently. However, a well-trained DL model always demands high-quality and sufficient data, which is hard to obtain without costing high expenses and human labor. In this paper, we introduce a novel data augmentation method for skeleton-based action recognition tasks, which can effectively generate high-quality and diverse sequential actions. In order to obtain natural and realistic action sequences, we propose denoising diffusion probabilistic models (DDPMs) that can generate a series of synthetic action sequences, and their generation process is precisely guided by a spatial-temporal transformer (ST-Trans). Experimental results show that our method outperforms the state-of-the-art (SOTA) motion generation approaches on different naturality and diversity metrics. It proves that its high-quality synthetic data can also be effectively deployed to existing action recognition models with significant performance improvement.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2302.13434/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/2302.13434/full.md

## References

76 references — full list in the complete paper: https://tomesphere.com/paper/2302.13434/full.md

---
Source: https://tomesphere.com/paper/2302.13434