Mind the Gap: Aligning the Brain with Language Models Requires a   Nonlinear and Multimodal Approach

Danny Dongyeop Han; Yunju Cho; Jiook Cha; Jay-Yoon Lee

arXiv:2502.12771·cs.CL·February 19, 2025

Mind the Gap: Aligning the Brain with Language Models Requires a Nonlinear and Multimodal Approach

Danny Dongyeop Han, Yunju Cho, Jiook Cha, Jay-Yoon Lee

PDF

Open Access

TL;DR

This paper introduces a nonlinear, multimodal brain prediction model combining audio and linguistic features, significantly improving prediction accuracy and revealing neural integration of auditory and semantic information during speech comprehension.

Contribution

It presents a novel nonlinear, multimodal approach that outperforms traditional linear models in predicting brain responses to speech, advancing neurolinguistic modeling.

Findings

01

17.2% and 17.9% improvement over linear models

02

7.7% and 14.4% improvement over state-of-the-art models

03

Reveals integration of auditory and semantic info in brain regions

Abstract

Self-supervised language and audio models effectively predict brain responses to speech. However, traditional prediction models rely on linear mappings from unimodal features, despite the complex integration of auditory signals with linguistic and semantic information across widespread brain networks during speech comprehension. Here, we introduce a nonlinear, multimodal prediction model that combines audio and linguistic features from pre-trained models (e.g., LLAMA, Whisper). Our approach achieves a 17.2% and 17.9% improvement in prediction performance (unnormalized and normalized correlation) over traditional unimodal linear models, as well as a 7.7% and 14.4% improvement, respectively, over prior state-of-the-art models. These improvements represent a major step towards future robust in-silico testing and improved decoding performance. They also reveal how auditory and semantic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems