SymbolicLight V1: Spike-Gated Dual-Path Language Modeling with High Activation Sparsity and Sub-Billion-Scale Pre-Training Evidence

Ting Liu

arXiv:2605.21333·cs.CL·May 21, 2026

SymbolicLight V1: Spike-Gated Dual-Path Language Modeling with High Activation Sparsity and Sub-Billion-Scale Pre-Training Evidence

Ting Liu

PDF

TL;DR

SymbolicLight V1 introduces a spike-gated dual-path language model combining binary spike dynamics with continuous streams, achieving high activation sparsity and competitive language modeling performance on a bilingual corpus.

Contribution

It presents a novel spike-gated dual-path architecture with a sparse attention module, demonstrating effective language modeling with high activation sparsity and scalable training on large datasets.

Findings

01

Achieved validation perplexity of 8.88-8.93 with 89% sparsity on a 3B-token Chinese-English corpus.

02

Spike-gated local attention significantly contributes to model performance.

03

Scaling up to 0.8B parameters on 48.8B tokens shows optimization and sparsity preservation.

Abstract

Natively trained spiking language models struggle to combine Transformer-like language quality, stable multi-domain pre-training, and high activation sparsity. We present SymbolicLight V1, a spike-gated dual-path language model that combines binary Leaky Integrate-and-Fire spike dynamics with a continuous residual stream. Its Dual-Path SparseTCAM module replaces dense self-attention with an exponential-decay aggregation path for long-range memory and a spike-gated local attention path for short-range precision, complemented by a dynamic context-conditioned decoding head and a bilingual tokenizer. A 194M-parameter SymbolicLight V1 model trained from scratch on a 3B-token Chinese-English corpus reaches held-out validation PPL 8.88-8.93 across four independent runs at >89% per-element activation sparsity. It trails GPT-2 201M by 7.7% in PPL while surpassing GPT-2 124M under the reported…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.