TL;DR
This paper introduces a novel Temporal Binary Representation method that encodes temporal information from event cameras into binary frames, enabling improved gesture recognition with deep learning and achieving state-of-the-art results.
Contribution
The paper proposes a new event aggregation strategy that converts event camera outputs into binary frames with embedded temporal data, enhancing recognition performance.
Findings
Achieved state-of-the-art results on DVS128 Gesture Dataset
Demonstrated effectiveness under challenging conditions
Proposed a lossless binary-to-decimal transformation for compact encoding
Abstract
In this paper we present an event aggregation strategy to convert the output of an event camera into frames processable by traditional Computer Vision algorithms. The proposed method first generates sequences of intermediate binary representations, which are then losslessly transformed into a compact format by simply applying a binary-to-decimal conversion. This strategy allows us to encode temporal information directly into pixel values, which are then interpreted by deep learning models. We apply our strategy, called Temporal Binary Representation, to the task of Gesture Recognition, obtaining state of the art results on the popular DVS128 Gesture Dataset. To underline the effectiveness of the proposed method compared to existing ones, we also collect an extension of the dataset under more challenging conditions on which to perform experiments.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
