SPACT18: Spiking Human Action Recognition Benchmark Dataset with Complementary RGB and Thermal Modalities
Yasser Ashraf, Ahmed Sharshar, Velibor Bojkovic, Bin Gu

TL;DR
This paper introduces SPACT18, a novel multimodal dataset with spike camera, RGB, and thermal data for benchmarking spiking neural networks in action recognition, promoting energy-efficient video understanding.
Contribution
It provides the first spike camera-based action recognition dataset with synchronized RGB and thermal modalities for comprehensive benchmarking.
Findings
Dataset enables multimodal video understanding research.
Preserves sparsity and temporal precision of spiking data.
Facilitates comparison of spiking, thermal, and RGB modalities.
Abstract
Spike cameras, bio-inspired vision sensors, asynchronously fire spikes by accumulating light intensities at each pixel, offering ultra-high energy efficiency and exceptional temporal resolution. Unlike event cameras, which record changes in light intensity to capture motion, spike cameras provide even finer spatiotemporal resolution and a more precise representation of continuous changes. In this paper, we introduce the first video action recognition (VAR) dataset using spike camera, alongside synchronized RGB and thermal modalities, to enable comprehensive benchmarking for Spiking Neural Networks (SNNs). By preserving the inherent sparsity and temporal precision of spiking data, our three datasets offer a unique platform for exploring multimodal video understanding and serve as a valuable resource for directly comparing spiking, thermal, and RGB modalities. This work contributes a…
Peer Reviews
Decision·Submitted to ICLR 2025
1. This paper is simple and easy to understand.
1. The authors emphasize that the temporal resolution of spike cameras is superior to that of event cameras. But I think that's wrong. The proof is not a hardware paper but an algorithm paper, which I don't think is a good starting point. The paper [1] would have been able to do 3.6us temporal resolution compared to the spike camera even in 2011. [1] Leñero-Bardallo J A, Serrano-Gotarredona T, Linares-Barranco B. A 3.6$\mu $ s Latency Asynchronous Frame-Free Event-Driven Dynamic-Vision-Sens
This article proposes SPACT18- a novel event based large-scale VAR task dataset. The dynamic response characteristics of events and the low power consumption of SNNs provide new possibilities for the development of VAR. Meanwhile, the method section innovatively proposes an alignment method for event pulses, which is a unique compression method suitable for VAR tasks. This article has made certain progress and contributions to VAR tasks.
The experimental part appears slightly thin. Using only two typical VAR algorithms for verification is insufficient for verifying the distribution of RGB and thermal modal data. At the same time, the validation of the SNN algorithm is still based on the ANN-SNN conversion method, which cannot fully utilize the parsing ability that SNN should have.
- The dataset is original and diverse, featuring 44 participants and 18 activities, making it a sufficiently challenging resource for validating models. - With three modalities—spike, RGB, and thermal—the dataset is highly informative and suitable for researchers in image processing, spiking neural networks, and multimodal analysis. - The writing is clear, and the paper is well-structured. - The authors validate the dataset using popular architectures, demonstrating its applicability.
1. The paper lacks a thorough comparison with existing action recognition datasets, particularly in terms of modality, participant diversity, types of activities, sample size, and duration. Given that the primary contribution of this work is the proposed dataset, a clear comparison would help emphasize its advantages over existing datasets. 2. The paper cites only one spike-based action recognition dataset (Amir et al., 2021). However, there exist other spike-based action recognition datasets, s
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Context-Aware Activity Recognition Systems · Anomaly Detection Techniques and Applications
