Do self-supervised speech and language models extract similar   representations as human brain?

Peili Chen; Linyang He; Li Fu; Lu Fan; Edward F. Chang; Yuanning Li

arXiv:2310.04645·q-bio.NC·February 1, 2024·1 cites

Do self-supervised speech and language models extract similar representations as human brain?

Peili Chen, Linyang He, Li Fu, Lu Fan, Edward F. Chang, Yuanning Li

PDF

Open Access

TL;DR

This study compares self-supervised speech and language models, Wav2Vec2.0 and GPT-2, revealing they predict similar neural responses in the brain's auditory cortex, emphasizing shared contextual speech representations.

Contribution

It demonstrates that SSL models trained on speech and language tasks share neural correlates, highlighting their convergent speech contextual representations and alignment with brain activity.

Findings

01

Both models predict auditory cortex responses accurately.

02

Significant correlation between Wav2Vec2.0 and GPT-2 brain predictions.

03

Shared speech context explains most variance in neural activity.

Abstract

Speech and language models trained through self-supervised learning (SSL) demonstrate strong alignment with brain activity during speech and language perception. However, given their distinct training modalities, it remains unclear whether they correlate with the same neural aspects. We directly address this question by evaluating the brain prediction performance of two representative SSL models, Wav2Vec2.0 and GPT-2, designed for speech and language tasks. Our findings reveal that both models accurately predict speech responses in the auditory cortex, with a significant correlation between their brain predictions. Notably, shared speech contextual information between Wav2Vec2.0 and GPT-2 accounts for the majority of explained variance in brain activity, surpassing static semantic and lower-level acoustic-phonetic information. These results underscore the convergence of speech…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Topic Modeling

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Linear Layer · Cosine Annealing · Discriminative Fine-Tuning · Dropout · Weight Decay · Multi-Head Attention · Softmax · Byte Pair Encoding · Linear Warmup With Cosine Annealing