Stochastic Attention: Connectome-Inspired Randomized Routing for Expressive Linear-Time Attention

Zehao Jin; Yanan Sui

arXiv:2604.00754·cs.CL·May 6, 2026

Stochastic Attention: Connectome-Inspired Randomized Routing for Expressive Linear-Time Attention

Zehao Jin, Yanan Sui

PDF

TL;DR

This paper introduces Stochastic Attention, a connectome-inspired randomized routing method that enhances local attention windows to achieve global sequence coverage efficiently, improving language model performance.

Contribution

It proposes a novel stochastic permutation-based attention mechanism that exponentially increases receptive fields with depth, outperforming existing efficient attention methods.

Findings

01

SA achieves full sequence coverage in O(log_w n) layers.

02

Gated SA + SWA improves zero-shot accuracy in language models.

03

SA outperforms SWA and matches or exceeds Mixture of Block Attention.

Abstract

The whole-brain connectome of a fruit fly comprises over 130K neurons connected with a probability of merely 0.02%, yet achieves an average shortest path of only 4.4 hops. Despite being highly structured at the circuit level, the network's long-range connections are broadly distributed across brain regions, functioning as stochastic shortcuts that enable efficient global communication. Inspired by this observation, we propose Stochastic Attention (SA), a drop-in enhancement for sliding-window attention (SWA) that applies a random permutation to the token sequence before windowed attention and restores the original order afterward. This transforms the fixed local window into a stochastic global one within the same $O (n w)$ per-layer budget. Through depth, independently sampled permutations yield exponentially growing receptive fields, achieving full sequence coverage in $O (lo g_{w} n)$ …

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.