LibriBrain: Over 50 Hours of Within-Subject MEG to Improve Speech Decoding Methods at Scale
Miran \"Ozdogan, Gilad Landau, Gereon Elvers, Dulhan Jayalath, Pratik Somaiya, Francesco Mantegna, Mark Woolrich, Oiwi Parker Jones

TL;DR
LibriBrain is the largest single-subject MEG dataset for speech decoding, enabling new research into neural representations and improving decoding methods through extensive data and baseline benchmarks.
Contribution
The paper introduces LibriBrain, a large-scale, high-quality MEG dataset with detailed annotations and baseline results, supporting advancements in neural speech decoding.
Findings
Increasing training data improves decoding performance
Large within-subject datasets enhance neural decoding accuracy
Baseline results establish benchmarks for future research
Abstract
LibriBrain represents the largest single-subject MEG dataset to date for speech decoding, with over 50 hours of recordings -- 5 larger than the next comparable dataset and 50 larger than most. This unprecedented `depth' of within-subject data enables exploration of neural representations at a scale previously unavailable with non-invasive methods. LibriBrain comprises high-quality MEG recordings together with detailed annotations from a single participant listening to naturalistic spoken English, covering nearly the full Sherlock Holmes canon. Designed to support advances in neural decoding, LibriBrain comes with a Python library for streamlined integration with deep learning frameworks, standard data splits for reproducibility, and baseline results for three foundational decoding tasks: speech detection, phoneme classification, and word classification. Baseline…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsSpeech and dialogue systems
