Exploiting Spatial Sparsity for Event Cameras with Visual Transformers

Zuowen Wang; Yuhuang Hu; Shih-Chii Liu

arXiv:2202.05054·cs.CV·February 11, 2022

Exploiting Spatial Sparsity for Event Cameras with Visual Transformers

Zuowen Wang, Yuhuang Hu, Shih-Chii Liu

PDF

TL;DR

This paper introduces a method using visual transformers to efficiently process event camera data by focusing on spatially active patches, significantly reducing computation while maintaining high accuracy.

Contribution

It proposes a novel patch selection strategy for ViT models that exploits spatial sparsity in event camera data, improving efficiency.

Findings

01

At least 50% reduction in patches processed during inference.

02

51% decrease in MAC operations.

03

0.34% drop in classification accuracy.

Abstract

Event cameras report local changes of brightness through an asynchronous stream of output events. Events are spatially sparse at pixel locations with little brightness variation. We propose using a visual transformer (ViT) architecture to leverage its ability to process a variable-length input. The input to the ViT consists of events that are accumulated into time bins and spatially separated into non-overlapping sub-regions called patches. Patches are selected when the number of nonzero pixel locations within a sub-region is above a threshold. We show that by fine-tuning a ViT model on the selected active patches, we can reduce the average number of patches fed into the backbone during the inference by at least 50% with only a minor drop (0.34%) of the classification accuracy on the N-Caltech101 dataset. This reduction translates into a decrease of 51% in Multiply-Accumulate (MAC)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Memory and Neural Computing · Semiconductor materials and devices · Electrochemical Analysis and Applications

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings