Mind the Gap: Aligning the Brain with Language Models Requires a Nonlinear and Multimodal Approach
Danny Dongyeop Han, Yunju Cho, Jiook Cha, Jay-Yoon Lee

TL;DR
This paper introduces a nonlinear, multimodal brain prediction model combining audio and linguistic features, significantly improving prediction accuracy and revealing neural integration of auditory and semantic information during speech comprehension.
Contribution
It presents a novel nonlinear, multimodal approach that outperforms traditional linear models in predicting brain responses to speech, advancing neurolinguistic modeling.
Findings
17.2% and 17.9% improvement over linear models
7.7% and 14.4% improvement over state-of-the-art models
Reveals integration of auditory and semantic info in brain regions
Abstract
Self-supervised language and audio models effectively predict brain responses to speech. However, traditional prediction models rely on linear mappings from unimodal features, despite the complex integration of auditory signals with linguistic and semantic information across widespread brain networks during speech comprehension. Here, we introduce a nonlinear, multimodal prediction model that combines audio and linguistic features from pre-trained models (e.g., LLAMA, Whisper). Our approach achieves a 17.2% and 17.9% improvement in prediction performance (unnormalized and normalized correlation) over traditional unimodal linear models, as well as a 7.7% and 14.4% improvement, respectively, over prior state-of-the-art models. These improvements represent a major step towards future robust in-silico testing and improved decoding performance. They also reveal how auditory and semantic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems
