Spike-driven Transformer

Man Yao; Jiakui Hu; Zhaokun Zhou; Li Yuan; Yonghong Tian; Bo Xu; Guoqi; Li

arXiv:2307.01694·cs.NE·July 6, 2023·35 cites

Spike-driven Transformer

Man Yao, Jiakui Hu, Zhaokun Zhou, Li Yuan, Yonghong Tian, Bo Xu, Guoqi, Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces a spike-driven Transformer that leverages event-driven, binary spike communication and sparse addition operations, achieving significant energy efficiency and state-of-the-art accuracy in SNNs on ImageNet-1K.

Contribution

It proposes a novel Spike-driven Self-Attention mechanism with linear complexity and zero multiplication, integrating spike-based operations into Transformer architecture.

Findings

01

Achieves 77.1% top-1 accuracy on ImageNet-1K

02

Up to 87.2x lower computation energy than traditional self-attention

03

Operates with only sparse addition operations

Abstract

Spiking Neural Networks (SNNs) provide an energy-efficient deep learning option due to their unique spike-based event-driven (i.e., spike-driven) paradigm. In this paper, we incorporate the spike-driven paradigm into Transformer by the proposed Spike-driven Transformer with four unique properties: 1) Event-driven, no calculation is triggered when the input of Transformer is zero; 2) Binary spike communication, all matrix multiplications associated with the spike matrix can be transformed into sparse additions; 3) Self-attention with linear complexity at both token and channel dimensions; 4) The operations between spike-form Query, Key, and Value are mask and addition. Together, there are only sparse addition operations in the Spike-driven Transformer. To this end, we design a novel Spike-Driven Self-Attention (SDSA), which exploits only mask and addition operations without any…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

biclab/spike-driven-transformer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Memory and Neural Computing · Neural Networks and Reservoir Computing · Ferroelectric and Negative Capacitance Devices

MethodsMulti-Head Attention · Attention Is All You Need · Dense Connections · Dropout · Byte Pair Encoding · Softmax · Layer Normalization · Position-Wise Feed-Forward Layer · Linear Layer · Absolute Position Encodings