SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks
Rui-Jie Zhu, Qihang Zhao, Guoqi Li, Jason K. Eshraghian

TL;DR
SpikeGPT introduces a large, energy-efficient spiking neural network-based language model that maintains competitive performance with traditional models while significantly reducing computational requirements, especially on neuromorphic hardware.
Contribution
The paper presents the first large-scale backpropagation-trained SNN for language generation, replacing attention with linear complexity mechanisms in a transformer architecture.
Findings
SpikeGPT is the largest backpropagation-trained SNN for language.
It achieves competitive performance on benchmarks.
It reduces computational operations by 20x on neuromorphic hardware.
Abstract
As the size of large language models continue to scale, so does the computational resources required to run it. Spiking Neural Networks (SNNs) have emerged as an energy-efficient approach to deep learning that leverage sparse and event-driven activations to reduce the computational overhead associated with model inference. While they have become competitive with non-spiking models on many computer vision tasks, SNNs have also proven to be more challenging to train. As a result, their performance lags behind modern deep learning, and we are yet to see the effectiveness of SNNs in language generation. In this paper, inspired by the Receptance Weighted Key Value (RWKV) language model, we successfully implement `SpikeGPT', a generative language model with binary, event-driven spiking activation units. We train the proposed model on two model variants: 45M and 216M parameters. To the best of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Memory and Neural Computing · Ferroelectric and Negative Capacitance Devices · Modular Robots and Swarm Intelligence
