A cross-species neural foundation model for end-to-end speech decoding

Yizi Zhang; Linyang He; Chaofei Fan; Tingkai Liu; Han Yu; Trung Le; Jingyuan Li; Scott Linderman; Lea Duncker; Francis R Willett; Nima Mesgarani; Liam Paninski

arXiv:2511.21740·cs.CL·May 15, 2026

A cross-species neural foundation model for end-to-end speech decoding

Yizi Zhang, Linyang He, Chaofei Fan, Tingkai Liu, Han Yu, Trung Le, Jingyuan Li, Scott Linderman, Lea Duncker, Francis R Willett, Nima Mesgarani, Liam Paninski

PDF

1 Video

TL;DR

This paper introduces an end-to-end neural framework for decoding speech from neural activity, achieving state-of-the-art results and enabling cross-task generalization in brain-computer interfaces.

Contribution

The paper presents a novel cross-species pretrained neural encoder integrated into an end-to-end speech decoding model, surpassing previous benchmarks and reducing word error rates significantly.

Findings

01

Achieved new state-of-the-art on Brain-to-Text benchmarks.

02

Reduced word error rate from 24.69% to 10.22%.

03

Small-scale audio LLMs improve decoding performance.

Abstract

Speech brain-computer interfaces (BCIs) aim to restore communication for people with paralysis by translating neural activity into text. Most systems use cascaded frameworks that decode phonemes before assembling sentences with an n-gram language model (LM), preventing joint optimization of all stages simultaneously. Here, we introduce an end-to-end BraIn-to-Text (BIT) framework that translates neural activity into coherent sentences using a single differentiable neural network. Central to our approach is a cross-task, cross-species pretrained neural encoder, whose representations transfer to both attempted and imagined speech. In a cascaded setting with an n-gram LM, the pretrained encoder establishes a new state-of-the-art (SOTA) on the Brain-to-Text '24 and '25 benchmarks. Integrated end-to-end with audio large language models (LLMs) and trained with contrastive learning for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

A cross-species neural foundation model for end-to-end speech decoding· slideslive