Vision and Inertial Sensing Fusion for Human Action Recognition : A Review
Sharmin Majumder, Nasser Kehtarnavaz

TL;DR
This survey reviews methods combining vision and inertial sensing for human action recognition, highlighting fusion techniques, datasets, challenges, and future directions to improve accuracy in real-world applications.
Contribution
It categorizes existing fusion approaches, features, classifiers, and datasets, providing a comprehensive overview of the state-of-the-art in multimodal human action recognition.
Findings
Fusion of vision and inertial sensing improves recognition accuracy.
Various fusion strategies and features are employed across studies.
Challenges include real-world deployment and dataset limitations.
Abstract
Human action recognition is used in many applications such as video surveillance, human computer interaction, assistive living, and gaming. Many papers have appeared in the literature showing that the fusion of vision and inertial sensing improves recognition accuracies compared to the situations when each sensing modality is used individually. This paper provides a survey of the papers in which vision and inertial sensing are used simultaneously within a fusion framework in order to perform human action recognition. The surveyed papers are categorized in terms of fusion approaches, features, classifiers, as well as multimodality datasets considered. Challenges as well as possible future directions are also stated for deploying the fusion of these two sensing modalities under realistic conditions.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
