# ChaLearn Looking at People: IsoGD and ConGD Large-scale RGB-D Gesture   Recognition

**Authors:** Jun Wan, Chi Lin, Longyin Wen, Yunan Li, Qiguang Miao, Sergio, Escalera, Gholamreza Anbarjafari, Isabelle Guyon, Guodong Guo, Stan Z. Li

arXiv: 1907.12193 · 2020-07-30

## TL;DR

This paper presents large-scale RGB-D gesture recognition datasets, analyzes current methods, introduces a new segmentation metric, and proposes a Bi-LSTM baseline that outperforms existing techniques.

## Contribution

Creation of benchmark datasets for isolated and continuous gesture recognition, analysis of state-of-the-art methods, and introduction of a Bi-LSTM baseline with improved segmentation performance.

## Key findings

- Bi-LSTM baseline outperforms existing methods by 8.1% in CSR
- New datasets facilitate large-scale gesture recognition research
- Analysis highlights challenges and progress in RGB-D gesture recognition

## Abstract

The ChaLearn large-scale gesture recognition challenge has been run twice in two workshops in conjunction with the International Conference on Pattern Recognition (ICPR) 2016 and International Conference on Computer Vision (ICCV) 2017, attracting more than $200$ teams round the world. This challenge has two tracks, focusing on isolated and continuous gesture recognition, respectively. This paper describes the creation of both benchmark datasets and analyzes the advances in large-scale gesture recognition based on these two datasets. We discuss the challenges of collecting large-scale ground-truth annotations of gesture recognition, and provide a detailed analysis of the current state-of-the-art methods for large-scale isolated and continuous gesture recognition based on RGB-D video sequences. In addition to recognition rate and mean jaccard index (MJI) as evaluation metrics used in our previous challenges, we also introduce the corrected segmentation rate (CSR) metric to evaluate the performance of temporal segmentation for continuous gesture recognition. Furthermore, we propose a bidirectional long short-term memory (Bi-LSTM) baseline method, determining the video division points based on the skeleton points extracted by convolutional pose machine (CPM). Experiments demonstrate that the proposed Bi-LSTM outperforms the state-of-the-art methods with an absolute improvement of $8.1\%$ (from $0.8917$ to $0.9639$) of CSR.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1907.12193/full.md

## Figures

12 figures with captions in the complete paper: https://tomesphere.com/paper/1907.12193/full.md

## References

76 references — full list in the complete paper: https://tomesphere.com/paper/1907.12193/full.md

---
Source: https://tomesphere.com/paper/1907.12193