NeuronSpark: A Spiking Neural Network Language Model with Selective State Space Dynamics
Zhengzheng Tang

TL;DR
NeuronSpark demonstrates that a pure spiking neural network can effectively perform large-scale language modeling, achieving competitive loss and early dialogue capabilities without traditional transformer architectures.
Contribution
Introduces NeuronSpark, a 0.9B-parameter SNN language model trained from scratch with novel dynamics and stabilization techniques, showing feasibility at scale.
Findings
Achieves 3.6 pretraining loss on large-scale language modeling
Displays early multi-turn dialogue behavior after supervised fine-tuning
Supports the feasibility of end-to-end SNN language modeling at this scale
Abstract
We ask whether a pure spiking backbone can learn large-scale language modeling from random initialization, without Transformer distillation. We introduce NeuronSpark, a 0.9B-parameter SNN language model trained with next-token prediction and surrogate gradients. The model combines selective state-space spiking dynamics, leakage-current inter-layer communication, PonderNet adaptive timesteps, fused Triton PLIF kernels, and stabilization techniques (residual centering, lateral-inhibition normalization, and natural-gradient compensation). Under a constrained budget (about 1.4B pretraining tokens and 6.5K SFT steps), NeuronSpark-0.9B reaches 3.6 pretraining loss and shows early multi-turn dialogue behavior after SFT. These results support the feasibility of end-to-end language modeling with a pure SNN architecture at this scale.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFerroelectric and Negative Capacitance Devices · Advanced Memory and Neural Computing · Neurobiology of Language and Bilingualism
