BiSpikCLM: A Spiking Language Model integrating Softmax-Free Spiking Attention and Spike-Aware Alignment Distillation

Sihang Guo; Chenlin Zhou; Jiaqi Wang; Kehai Chen; Qingyan Meng; and Zhengyu Ma

arXiv:2605.13859·cs.NE·May 15, 2026

BiSpikCLM: A Spiking Language Model integrating Softmax-Free Spiking Attention and Spike-Aware Alignment Distillation

Sihang Guo, Chenlin Zhou, Jiaqi Wang, Kehai Chen, Qingyan Meng, and Zhengyu Ma

PDF

TL;DR

BiSpikCLM is a fully binary, energy-efficient spiking language model that eliminates floating-point operations and uses novel training techniques to achieve competitive NLP performance with significantly reduced computational cost.

Contribution

It introduces the first fully binary spiking MatMul-free causal language model with Softmax-Free Spiking Attention and Spike-Aware Alignment Distillation for efficient training.

Findings

01

BiSpikCLM reaches comparable performance to ANN models with only 5.6% of training tokens.

02

The model achieves 4.16% - 5.87% of the computational cost of traditional models.

03

The proposed methods demonstrate the feasibility of fully binary spike-driven NLP models.

Abstract

Spiking Neural Networks (SNNs) offer promising energy-efficient alternatives to large language models (LLMs) due to their event-driven nature and ultra-low power consumption. However, to preserve capacity, most existing spiking LLMs still incur intensive floating-point matrix multiplication (MatMul) and nonlinearities, or training difficulties arising from the complex spatiotemporal dynamics. To address these challenges, we propose BiSpikCLM, the first fully binary spiking MatMul-free causal language model. BiSpikCLM introduces Softmax-Free Spiking Attention (SFSA), eliminating softmax and floating-point operations in autoregressive language modeling. For efficient training, we introduce Spike-Aware Alignment Distillation (SpAD), which aligns ANN teacher and SNN student across embeddings, attention maps, intermediate features, and output logits. SpAD framework allows BiSpikCLM to reach…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.