Spike-driven Large Language Model
Han Xu, Xuerui Qiu, Baiyu Chen, Xinhao Luo, Xingrun Xing, Jiahong Zhang, Bo Lei, Tiejun Huang, Bo Xu, Guoqi Li

TL;DR
SDLLM introduces a spike-driven large language model that replaces dense matrix multiplications with sparse additions, significantly reducing inference costs while maintaining state-of-the-art performance.
Contribution
The paper proposes SDLLM, a novel spike-driven LLM that uses sparse addition operations and advanced spike encoding to improve efficiency and accuracy over previous spike-based models.
Findings
SDLLM reduces energy consumption by 7x compared to previous models.
SDLLM improves accuracy by 4.2% over prior spike-based LLMs.
The model halves the number of time steps needed for inference.
Abstract
Current Large Language Models (LLMs) are primarily based on large-scale dense matrix multiplications. Inspired by the brain's information processing mechanism, we explore the fundamental question: how to effectively integrate the brain's spiking-driven characteristics into LLM inference. Spiking Neural Networks (SNNs) possess spike-driven characteristics, and some works have attempted to combine SNNs with Transformers. However, achieving spike-driven LLMs with billions of parameters, relying solely on sparse additions, remains a challenge in the SNN field. To address the issues of limited representational capacity and sparsity in existing spike encoding schemes at the LLM level, we propose SDLLM, a spike-driven large language model that eliminates dense matrix multiplications through sparse addition operations. Specifically, we use the plug-and-play gamma-SQP two-step spike encoding…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
