BrainTransformers: SNN-LLM
Zhengzheng Tang, Eva Zhu

TL;DR
BrainTransformers introduces a novel SNN-based large language model with specialized components, achieving competitive benchmark results and promising energy efficiency and biological plausibility for brain-inspired AI.
Contribution
This work designs SNN-compatible Transformer components, implements an SNN approximation of SiLU, and develops a synaptic plasticity module, advancing neuromorphic NLP models.
Findings
Achieved 63.2 on MMLU benchmark
Demonstrated competitive performance on multiple benchmarks
Potential for improved energy efficiency and biological plausibility
Abstract
This study introduces BrainTransformers, an innovative Large Language Model (LLM) implemented using Spiking Neural Networks (SNN). Our key contributions include: (1) designing SNN-compatible Transformer components such as SNNMatmul, SNNSoftmax, and SNNSiLU; (2) implementing an SNN approximation of the SiLU activation function; and (3) developing a Synapsis module to simulate synaptic plasticity. Our 3-billion parameter model, BrainTransformers-3B-Chat, demonstrates competitive performance across various benchmarks, including MMLU (63.2), BBH (54.1), ARC-C (54.3), and GSM8K (76.3), while potentially offering improved energy efficiency and biological plausibility. The model employs a three-stage training approach, including SNN-specific neuronal synaptic plasticity training. This research opens new avenues for brain-like AI systems in natural language processing and neuromorphic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFunctional Brain Connectivity Studies
MethodsAttention Is All You Need · Dense Connections · Sigmoid Linear Unit · Layer Normalization · Residual Connection · Position-Wise Feed-Forward Layer · Adam · Linear Layer · Softmax · Spiking Neural Networks
