Matterhorn: Efficient Analog Sparse Spiking Transformer Architecture with Masked Time-To-First-Spike Encoding

Zhanglu Yan; Kaiwen Tang; Zixuan Zhu; Zhenyu Bai; Qianhui Liu; Weng-Fai Wong

arXiv:2601.22876·cs.LG·February 2, 2026

Matterhorn: Efficient Analog Sparse Spiking Transformer Architecture with Masked Time-To-First-Spike Encoding

Zhanglu Yan, Kaiwen Tang, Zixuan Zhu, Zhenyu Bai, Qianhui Liu, Weng-Fai Wong

PDF

Open Access

TL;DR

Matterhorn introduces an energy-efficient spiking transformer with novel encoding and memristive hardware, achieving state-of-the-art accuracy and significant energy savings on language tasks.

Contribution

The paper presents a new spiking transformer architecture with masked time-to-first-spike encoding and memristive synapse units, reducing energy consumption while improving accuracy.

Findings

01

Surpasses existing SNNs by 1.42% on GLUE benchmark

02

Achieves 2.31x energy efficiency improvement

03

Maintains high accuracy with reduced spike movement and memory access costs

Abstract

Spiking neural networks (SNNs) have emerged as a promising candidate for energy-efficient LLM inference. However, current energy evaluations for SNNs primarily focus on counting accumulate operations, and fail to account for real-world hardware costs such as data movement, which can consume nearly 80% of the total energy. In this paper, we propose Matterhorn, a spiking transformer that integrates a novel masked time-to-first-spike (M-TTFS) encoding method to reduce spike movement and a memristive synapse unit (MSU) to eliminate weight access overhead. M-TTFS employs a masking strategy that reassigns the zero-energy silent state (a spike train of all 0s) to the most frequent membrane potential rather than the lowest. This aligns the coding scheme with the data distribution, minimizing spike movement energy without information loss. We further propose a `dead zone' strategy that maximizes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Memory and Neural Computing · Neural Networks and Reservoir Computing · Neural dynamics and brain function