HAMLET: A Hierarchical Multimodal Attention-based Human Activity Recognition Algorithm
Md Mofijul Islam, Tariq Iqbal

TL;DR
HAMLET is a deep neural network that uses hierarchical multimodal attention to improve human activity recognition accuracy across multiple datasets, aiding robots in better understanding human actions.
Contribution
This work introduces HAMLET, a novel hierarchical multimodal attention-based neural network for human activity recognition, effectively fusing multimodal data with improved accuracy.
Findings
HAMLET outperformed state-of-the-art algorithms on three datasets.
Highest accuracy of 95.12% and 97.45% on UTD-MHAD and UT-Kinect.
F1-score of 81.52% on UCSD-MIT.
Abstract
To fluently collaborate with people, robots need the ability to recognize human activities accurately. Although modern robots are equipped with various sensors, robust human activity recognition (HAR) still remains a challenging task for robots due to difficulties related to multimodal data fusion. To address these challenges, in this work, we introduce a deep neural network-based multimodal HAR algorithm, HAMLET. HAMLET incorporates a hierarchical architecture, where the lower layer encodes spatio-temporal features from unimodal data by adopting a multi-head self-attention mechanism. We develop a novel multimodal attention mechanism for disentangling and fusing the salient unimodal features to compute the multimodal features in the upper layer. Finally, multimodal features are used in a fully connect neural-network to recognize human activities. We evaluated our algorithm by comparing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
