Toward a realistic model of speech processing in the brain with self-supervised learning
Juliette Millet, Charlotte Caucheteux, Pierre Orhan, Yves Boubenec,, Alexandre Gramfort, Ewan Dunbar, Christophe Pallier, Jean-Remi King

TL;DR
This study demonstrates that self-supervised learning models trained on raw speech can develop brain-like representations of speech processing, aligning with cortical hierarchies and requiring realistic amounts of data.
Contribution
It shows that Wav2Vec 2.0, trained on about 600 hours of speech, can replicate brain activity patterns during speech perception, bridging the gap between AI models and neural processes.
Findings
Wav2Vec 2.0 learns brain-like representations with 600 hours of speech.
Model's hierarchy aligns with cortical speech processing hierarchy.
Functional specialization in the model mirrors cortical regions.
Abstract
Several deep neural networks have recently been shown to generate activations similar to those of the brain in response to the same input. These algorithms, however, remain largely implausible: they require (1) extraordinarily large amounts of data, (2) unobtainable supervised labels, (3) textual rather than raw sensory input, and / or (4) implausibly large memory (e.g. thousands of contextual words). These elements highlight the need to identify algorithms that, under these limitations, would suffice to account for both behavioral and brain responses. Focusing on the issue of speech processing, we here hypothesize that self-supervised algorithms trained on the raw waveform constitute a promising candidate. Specifically, we compare a recent self-supervised architecture, Wav2Vec 2.0, to the brain activity of 412 English, French, and Mandarin individuals recorded with functional Magnetic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNeural Networks and Applications
