Spiking Patches: Asynchronous, Sparse, and Efficient Tokens for Event Cameras
Christoffer Koo {\O}hrstr{\o}m, Ronja G\"uldenring, Lazaros Nalpantidis

TL;DR
This paper introduces Spiking Patches, a novel tokenization method for event cameras that maintains their asynchronous and sparse nature, enabling faster inference without sacrificing accuracy in gesture recognition and object detection.
Contribution
The paper presents a new tokenization approach that preserves event camera properties and improves inference speed while maintaining or enhancing accuracy.
Findings
Tokens from Spiking Patches are up to 3.4x faster than voxel-based tokens.
Achieved up to 3.8 absolute accuracy improvement in gesture recognition.
Matching or surpassing accuracy of prior representations in object detection.
Abstract
We propose tokenization of events and present a tokenizer, Spiking Patches, specifically designed for event cameras. Given a stream of asynchronous and spatially sparse events, our goal is to discover an event representation that preserves these properties. Prior works have represented events as frames or as voxels. However, while these representations yield high accuracy, both frames and voxels are synchronous and decrease the spatial sparsity. Spiking Patches gives the means to preserve the unique properties of event cameras and we show in our experiments that this comes without sacrificing accuracy. We evaluate our tokenizer using a GNN, PCN, and a Transformer on gesture recognition and object detection. Tokens from Spiking Patches yield inference times that are up to 3.4x faster than voxel-based tokens and up to 10.4x faster than frames. We achieve this while matching their accuracy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
