QKFormer: Hierarchical Spiking Transformer using Q-K Attention

Chenlin Zhou; Han Zhang; Zhaokun Zhou; Liutao Yu; Liwei Huang; Xiaopeng Fan; Li Yuan; Zhengyu Ma; Huihui Zhou; Yonghong Tian

arXiv:2403.16552·cs.NE·May 22, 2026·6 cites

QKFormer: Hierarchical Spiking Transformer using Q-K Attention

Chenlin Zhou, Han Zhang, Zhaokun Zhou, Liutao Yu, Liwei Huang, Xiaopeng Fan, Li Yuan, Zhengyu Ma, Huihui Zhou, Yonghong Tian

PDF

1 Repo 1 Video

TL;DR

QKFormer introduces a hierarchical spiking transformer with a novel Q-K attention mechanism, achieving state-of-the-art accuracy on ImageNet-1k by directly training SNNs.

Contribution

It presents a new Q-K attention mechanism and hierarchical structure for spiking transformers, significantly improving performance over existing models.

Findings

01

QKFormer achieves 85.65% top-1 accuracy on ImageNet-1k.

02

It outperforms Spikformer by 10.84% in accuracy.

03

The model demonstrates superior performance on various datasets.

Abstract

Spiking Transformers, which integrate Spiking Neural Networks (SNNs) with Transformer architectures, have attracted significant attention due to their potential for energy efficiency and high performance. However, existing models in this domain still suffer from suboptimal performance. We introduce several innovations to improve the performance: i) We propose a novel spike-form Q-K attention mechanism, tailored for SNNs, which efficiently models the importance of token or channel dimensions through binary vectors with linear complexity. ii) We incorporate the hierarchical structure, which significantly benefits the performance of both the brain and artificial neural networks, into spiking transformers to obtain multi-scale spiking representation. iii) We design a versatile and powerful patch embedding module with a deformed shortcut specifically for spiking transformers. Together, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhouchenlin2096/QKFormer
github

Videos

QKFormer: Hierarchical Spiking Transformer using Q-K Attention· slideslive

Taxonomy

TopicsAdvanced Memory and Neural Computing · Neural Networks and Reservoir Computing · Neural dynamics and brain function

MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Byte Pair Encoding · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Multi-Head Attention · Softmax · Dropout