A Novel Two Stream Decision Level Fusion of Vision and Inertial Sensors Data for Automatic Multimodal Human Activity Recognition System
Santosh Kumar Yadav, Muhtashim Rafiqi, Egna Praneeth Gummana, Kamlesh, Tiwari, Hari Mohan Pandey, Shaik Ali Akbara

TL;DR
This paper introduces a new multimodal human activity recognition system that fuses vision and inertial sensor data at the decision level, achieving state-of-the-art accuracy on multiple benchmark datasets.
Contribution
It proposes a novel two-stream fusion approach combining pose estimation and inertial data with deep learning for improved activity recognition accuracy.
Findings
Achieved over 95% accuracy on four benchmark datasets.
Outperformed existing state-of-the-art methods significantly.
Demonstrated robustness across diverse human activity datasets.
Abstract
This paper presents a novel multimodal human activity recognition system. It uses a two-stream decision level fusion of vision and inertial sensors. In the first stream, raw RGB frames are passed to a part affinity field-based pose estimation network to detect the keypoints of the user. These keypoints are then pre-processed and inputted in a sliding window fashion to a specially designed convolutional neural network for the spatial feature extraction followed by regularized LSTMs to calculate the temporal features. The outputs of LSTM networks are then inputted to fully connected layers for classification. In the second stream, data obtained from inertial sensors are pre-processed and inputted to regularized LSTMs for the feature extraction followed by fully connected layers for the classification. At this stage, the SoftMax scores of two streams are then fused using the decision level…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsContext-Aware Activity Recognition Systems · Human Pose and Action Recognition · Video Surveillance and Tracking Methods
MethodsTanh Activation · Sigmoid Activation · Long Short-Term Memory · Softmax
