Benchmarking Conventional Vision Models on Neuromorphic Fall Detection and Action Recognition Dataset
Karthik Sivarama Krishnan, Koushik Sivarama Krishnan

TL;DR
This paper benchmarks fine-tuned conventional vision models on neuromorphic datasets for fall detection and action recognition, demonstrating that the MViT architecture achieves the highest accuracy and F1 score among tested models.
Contribution
It introduces a benchmarking framework for conventional vision models on neuromorphic data and identifies the MViT architecture as the most effective for this application.
Findings
DVS-MViT achieves 95.8% accuracy and F1 score.
DVS-C2D achieves 91.6% accuracy and F1 score.
DVS-CSN and DVS-X3D perform less effectively.
Abstract
Neuromorphic vision-based sensors are gaining popularity in recent years with their ability to capture Spatio-temporal events with low power sensing. These sensors record events or spikes over traditional cameras which helps in preserving the privacy of the subject being recorded. These events are captured as per-pixel brightness changes and the output data stream is encoded with time, location, and pixel intensity change information. This paper proposes and benchmarks the performance of fine-tuned conventional vision models on neuromorphic human action recognition and fall detection datasets. The Spatio-temporal event streams from the Dynamic Vision Sensing cameras are encoded into a standard sequence image frames. These video frames are used for benchmarking conventional deep learning-based architectures. In this proposed approach, we fine-tuned the state-of-the-art vision models for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsMultiscale Vision Transformer
