The Brain's Bitter Lesson: Scaling Speech Decoding With Self-Supervised Learning
Dulhan Jayalath, Gilad Landau, Brendan Shillingford, Mark Woolrich, Oiwi Parker Jones

TL;DR
This paper introduces a neuroscience-informed self-supervised learning approach that scales speech decoding from brain activity across diverse datasets and subjects, achieving significant improvements and generalization capabilities.
Contribution
It develops a novel architecture and objectives for learning from heterogeneous brain recordings, enabling scalable and generalizable speech decoding models.
Findings
Achieves 15-27% improvement over state-of-the-art models.
Generalizes across participants, datasets, and tasks.
Matches surgical decoding performance with non-invasive data.
Abstract
The past few years have seen remarkable progress in the decoding of speech from brain activity, primarily driven by large single-subject datasets. However, due to individual variation, such as anatomy, and differences in task design and scanning hardware, leveraging data across subjects and datasets remains challenging. In turn, the field has not benefited from the growing number of open neural data repositories to exploit large-scale deep learning. To address this, we develop neuroscience-informed self-supervised objectives, together with an architecture, for learning from heterogeneous brain recordings. Scaling to nearly 400 hours of MEG data and 900 subjects, our approach shows generalisation across participants, datasets, tasks, and even to novel subjects. It achieves improvements of 15-27% over state-of-the-art models and matches surgical decoding performance with non-invasive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and dialogue systems
MethodsSparse Evolutionary Training
