Brain-tuned Speech Models Better Reflect Speech Processing Stages in the Brain
Omer Moussa, Mariya Toneva

TL;DR
This study shows that brain-tuned speech models better mirror the brain's hierarchical speech processing stages, especially in semantic regions, by aligning model layers with neural data.
Contribution
The paper demonstrates that fine-tuning speech models with brain recordings enhances their reflection of human speech processing hierarchy across layers.
Findings
Late layers of brain-tuned models align better with semantic brain regions.
Early layers focus on low-level acoustic features.
Brain-tuned models exhibit a hierarchical processing structure.
Abstract
Pretrained self-supervised speech models excel in speech tasks but do not reflect the hierarchy of human speech processing, as they encode rich semantics in middle layers and poor semantics in late layers. Recent work showed that brain-tuning (fine-tuning models using human brain recordings) improves speech models' semantic understanding. Here, we examine how well brain-tuned models further reflect the brain's intermediate stages of speech processing. We find that late layers of brain-tuned models substantially improve over pretrained models in their alignment with semantic language regions. Further layer-wise probing reveals that early layers remain dedicated to low-level acoustic features, while late layers become the best at complex high-level tasks. These findings show that brain-tuned models not only perform better but also exhibit a well-defined hierarchical processing going from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeurobiology of Language and Bilingualism · Neural dynamics and brain function · Phonetics and Phonology Research
