# NTU RGB+D 120: A Large-Scale Benchmark for 3D Human Activity   Understanding

**Authors:** Jun Liu, Amir Shahroudy, Mauricio Perez, Gang Wang, Ling-Yu Duan, Alex, C. Kot

arXiv: 1905.04757 · 2019-06-11

## TL;DR

This paper introduces NTU RGB+D 120, a large-scale, diverse dataset for 3D human activity recognition, and evaluates existing methods while proposing a novel one-shot recognition framework.

## Contribution

The paper provides a new extensive dataset with 120 action classes and over 114,000 videos, and introduces a novel one-shot recognition method using the APSR framework.

## Key findings

- Deep learning methods outperform traditional approaches.
- The dataset enables better generalization across diverse conditions.
- APSR framework shows promising results for novel action recognition.

## Abstract

Research on depth-based human activity analysis achieved outstanding performance and demonstrated the effectiveness of 3D representation for action recognition. The existing depth-based and RGB+D-based action recognition benchmarks have a number of limitations, including the lack of large-scale training samples, realistic number of distinct class categories, diversity in camera views, varied environmental conditions, and variety of human subjects. In this work, we introduce a large-scale dataset for RGB+D human action recognition, which is collected from 106 distinct subjects and contains more than 114 thousand video samples and 8 million frames. This dataset contains 120 different action classes including daily, mutual, and health-related activities. We evaluate the performance of a series of existing 3D activity analysis methods on this dataset, and show the advantage of applying deep learning methods for 3D-based human action recognition. Furthermore, we investigate a novel one-shot 3D activity recognition problem on our dataset, and a simple yet effective Action-Part Semantic Relevance-aware (APSR) framework is proposed for this task, which yields promising results for recognition of the novel action classes. We believe the introduction of this large-scale dataset will enable the community to apply, adapt, and develop various data-hungry learning techniques for depth-based and RGB+D-based human activity understanding. [The dataset is available at: http://rose1.ntu.edu.sg/Datasets/actionRecognition.asp]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1905.04757/full.md

## Figures

13 figures with captions in the complete paper: https://tomesphere.com/paper/1905.04757/full.md

## References

112 references — full list in the complete paper: https://tomesphere.com/paper/1905.04757/full.md

---
Source: https://tomesphere.com/paper/1905.04757