EGOFALLS: A visual-audio dataset and benchmark for fall detection using egocentric cameras
Xueyi Wang

TL;DR
This paper introduces EGOFALLS, a new egocentric video dataset and benchmark for fall detection, demonstrating that multimodal audio-visual fusion enhances detection accuracy for vulnerable populations.
Contribution
It presents the first public egocentric video dataset for fall detection and proposes a multimodal fusion method that improves detection performance.
Findings
Fusion of audio and visual data improves fall detection accuracy.
The dataset contains 10,948 samples from 14 subjects.
Late decision fusion outperforms individual modality approaches.
Abstract
Falls are significant and often fatal for vulnerable populations such as the elderly. Previous works have addressed the detection of falls by relying on data capture by a single sensor, images or accelerometers. In this work, we rely on multimodal descriptors extracted from videos captured by egocentric cameras. Our proposed method includes a late decision fusion layer that builds on top of the extracted descriptors. Furthermore, we collect a new dataset on which we assess our proposed approach. We believe this is the first public dataset of its kind. The dataset comprises 10,948 video samples by 14 subjects. We conducted ablation experiments to assess the performance of individual feature extractors, fusion of visual information, and fusion of both visual and audio information. Moreover, we experimented with internal and external cross-validation. Our results demonstrate that the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Gait Recognition and Analysis · Anomaly Detection Techniques and Applications
