A Multi-Modal CNN-LSTM Framework with Multi-Head Attention and Focal Loss for Real-Time Elderly Fall Detection
Lijie Zhou, Luran Wang

TL;DR
This paper introduces a multi-modal deep learning framework combining CNN, LSTM, multi-head attention, and focal loss for real-time elderly fall detection, achieving high accuracy and low latency on wearable sensor data.
Contribution
The novel MultiModalFallDetector integrates multi-scale CNN, multi-head attention, and transfer learning to improve fall detection accuracy and efficiency over existing methods.
Findings
Achieved 98.7% F1-score on SisFall dataset
Maintains sub-50ms inference latency on edge devices
Outperforms traditional machine learning and standard deep learning approaches
Abstract
The increasing global aging population has intensified the demand for reliable health monitoring systems, particularly those capable of detecting critical events such as falls among elderly individuals. Traditional fall detection approaches relying on single-modality acceleration data suffer from high false alarm rates, while conventional machine learning methods require extensive hand-crafted feature engineering. This paper proposes a novel multi-modal deep learning framework, MultiModalFallDetector, designed for real-time elderly fall detection using wearable sensors. Our approach integrates multiple innovations: a multi-scale CNN-based feature extractor capturing motion dynamics at varying temporal resolutions; fusion of tri-axial accelerometer, gyroscope, and four-channel physiological signals; incorporation of a multi-head self-attention mechanism for dynamic temporal weighting;…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsContext-Aware Activity Recognition Systems · Balance, Gait, and Falls Prevention · Human Pose and Action Recognition
