LibriBrain: Over 50 Hours of Within-Subject MEG to Improve Speech Decoding Methods at Scale

Miran \"Ozdogan; Gilad Landau; Gereon Elvers; Dulhan Jayalath; Pratik Somaiya; Francesco Mantegna; Mark Woolrich; Oiwi Parker Jones

arXiv:2506.02098·cs.LG·June 4, 2025

LibriBrain: Over 50 Hours of Within-Subject MEG to Improve Speech Decoding Methods at Scale

Miran \"Ozdogan, Gilad Landau, Gereon Elvers, Dulhan Jayalath, Pratik Somaiya, Francesco Mantegna, Mark Woolrich, Oiwi Parker Jones

PDF

Open Access 1 Datasets 1 Video

TL;DR

LibriBrain is the largest single-subject MEG dataset for speech decoding, enabling new research into neural representations and improving decoding methods through extensive data and baseline benchmarks.

Contribution

The paper introduces LibriBrain, a large-scale, high-quality MEG dataset with detailed annotations and baseline results, supporting advancements in neural speech decoding.

Findings

01

Increasing training data improves decoding performance

02

Large within-subject datasets enhance neural decoding accuracy

03

Baseline results establish benchmarks for future research

Abstract

LibriBrain represents the largest single-subject MEG dataset to date for speech decoding, with over 50 hours of recordings -- 5 $\times$ larger than the next comparable dataset and 50 $\times$ larger than most. This unprecedented `depth' of within-subject data enables exploration of neural representations at a scale previously unavailable with non-invasive methods. LibriBrain comprises high-quality MEG recordings together with detailed annotations from a single participant listening to naturalistic spoken English, covering nearly the full Sherlock Holmes canon. Designed to support advances in neural decoding, LibriBrain comes with a Python library for streamlined integration with deep learning frameworks, standard data splits for reproducibility, and baseline results for three foundational decoding tasks: speech detection, phoneme classification, and word classification. Baseline…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

pnpl/LibriBrain
dataset· 2.0k dl
2.0k dl

Videos

LibriBrain: Over 50 Hours of Within-Subject MEG to Improve Speech Decoding Methods at Scale· slideslive

Taxonomy

TopicsSpeech and dialogue systems