Do self-supervised speech and language models extract similar representations as human brain?
Peili Chen, Linyang He, Li Fu, Lu Fan, Edward F. Chang, Yuanning Li

TL;DR
This study compares self-supervised speech and language models, Wav2Vec2.0 and GPT-2, revealing they predict similar neural responses in the brain's auditory cortex, emphasizing shared contextual speech representations.
Contribution
It demonstrates that SSL models trained on speech and language tasks share neural correlates, highlighting their convergent speech contextual representations and alignment with brain activity.
Findings
Both models predict auditory cortex responses accurately.
Significant correlation between Wav2Vec2.0 and GPT-2 brain predictions.
Shared speech context explains most variance in neural activity.
Abstract
Speech and language models trained through self-supervised learning (SSL) demonstrate strong alignment with brain activity during speech and language perception. However, given their distinct training modalities, it remains unclear whether they correlate with the same neural aspects. We directly address this question by evaluating the brain prediction performance of two representative SSL models, Wav2Vec2.0 and GPT-2, designed for speech and language tasks. Our findings reveal that both models accurately predict speech responses in the auditory cortex, with a significant correlation between their brain predictions. Notably, shared speech contextual information between Wav2Vec2.0 and GPT-2 accounts for the majority of explained variance in brain activity, surpassing static semantic and lower-level acoustic-phonetic information. These results underscore the convergence of speech…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Topic Modeling
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Linear Layer · Cosine Annealing · Discriminative Fine-Tuning · Dropout · Weight Decay · Multi-Head Attention · Softmax · Byte Pair Encoding · Linear Warmup With Cosine Annealing
