Human Action Recognition from Various Data Modalities: A Review
Zehua Sun, Qiuhong Ke, Hossein Rahmani, Mohammed Bennamoun, Gang Wang, and Jun Liu

TL;DR
This review paper comprehensively surveys recent deep learning approaches for human action recognition across various data modalities, highlighting methods, fusion techniques, and benchmark results to guide future research.
Contribution
It provides a detailed overview of deep learning methods for HAR across multiple data modalities, including fusion and co-learning frameworks, with comparative benchmark analysis.
Findings
Deep learning methods vary across data modalities.
Fusion-based approaches improve recognition accuracy.
Benchmark results highlight current challenges and future directions.
Abstract
Human Action Recognition (HAR) aims to understand human behavior and assign a label to each action. It has a wide range of applications, and therefore has been attracting increasing attention in the field of computer vision. Human actions can be represented using various data modalities, such as RGB, skeleton, depth, infrared, point cloud, event stream, audio, acceleration, radar, and WiFi signal, which encode different sources of useful yet distinct information and have various advantages depending on the application scenarios. Consequently, lots of existing works have attempted to investigate different types of approaches for HAR using various modalities. In this paper, we present a comprehensive survey of recent progress in deep learning methods for HAR based on the type of input data modality. Specifically, we review the current mainstream deep learning methods for single data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
