SpikingBrain2.0: Brain-Inspired Foundation Models for Efficient Long-Context and Cross-Platform Inference

Yuqi Pan; Jinghao Zhuang; Yupeng Feng; Fangzhi Zhong; Siyu Ding; Xuerui Qiu; Shaowei Gu; Bohan Sun; Zhiyong Qin; Yibo Zhong; Lingtao Ouyang; Kun Yang; Zehao Liu; Yuhong Chou; Shurong Wang; Anjie Hu; Han Xu; Bo Xu; Guoqi Li

arXiv:2604.22575·cs.LG·April 27, 2026

SpikingBrain2.0: Brain-Inspired Foundation Models for Efficient Long-Context and Cross-Platform Inference

Yuqi Pan, Jinghao Zhuang, Yupeng Feng, Fangzhi Zhong, Siyu Ding, Xuerui Qiu, Shaowei Gu, Bohan Sun, Zhiyong Qin, Yibo Zhong, Lingtao Ouyang, Kun Yang, Zehao Liu, Yuhong Chou, Shurong Wang, Anjie Hu, Han Xu, Bo Xu, Guoqi Li

PDF

TL;DR

SpikingBrain2.0 introduces a brain-inspired foundation model with innovative sparse attention and dual quantization, enabling efficient long-context inference and cross-platform deployment with minimal training overhead.

Contribution

It proposes Dual-Space Sparse Attention and an optimized training pipeline, advancing long-context efficiency and multi-platform compatibility for lightweight foundation models.

Findings

01

Achieves over 10x speedup at 4M context length.

02

Supports over 10 million tokens on 8 GPUs.

03

Demonstrates neuromorphic inference with significant area and power reduction.

Abstract

Scaling context length is reshaping large-model development, yet full-attention Transformers suffer from prohibitive computation and inference bottlenecks at long sequences. A key challenge is to design foundation models that maintain performance and long-context efficiency with minimal training overhead. We introduce SpikingBrain2.0 (SpB2.0), a 5B model that advances both architecture and training efficiency of its predecessor. Our contributions are two-fold. (1) Architectural Innovation: We propose Dual-Space Sparse Attention (DSSA), an inter-layer hybrid of Sparse Softmax Attention (MoBA) and Sparse Linear Attention (SSE), achieving an improved performance-efficiency trade-off for long-context modeling. SpB2.0 further supports dual quantization paths: INT8-Spiking coding enables sparse event-driven computation, while FP8 coding accelerates inference on modern GPUs. (2) Enhanced…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.