Task-conditioned probing of instruction-tuned multimodal LLMs: Region-specific brain alignment patterns under naturalistic stimuli
Subba Reddy Oota, Khushbu Pahwa, Prachi Jindal, Satya Sai Srinath Namburi, Maneesh Singh, Tanmoy Chakraborty, Bapi S. Raju, Manish Gupta

TL;DR
This study investigates how instruction-tuned multimodal large language models (MLLMs) align with brain activity during naturalistic stimuli, revealing that instruction tuning enhances brain-model alignment and task-specific representations.
Contribution
It provides the first comprehensive analysis of instruction-tuned MLLMs' brain alignment across multiple modalities and tasks, highlighting the impact of instruction tuning on neural representation organization.
Findings
Instruction-tuned MLLMs show higher brain alignment than non-instruction-tuned models (~15%).
Task-specific MLLM representations vary across brain regions and are associated with higher brain alignment.
ICL models exhibit strong semantic organization, while IT models show weak coupling to instruction semantics.
Abstract
Recent voxel-wise multimodal brain encoding studies have shown that multimodal large language models (MLLMs) exhibit a higher degree of brain alignment compared to unimodal models. More recently, instruction-tuned multimodal (IT) models have been shown to generate task-specific representations that align strongly with brain activity, yet most prior evaluations focus on unimodal stimuli or non-instruction-tuned models under multimodal stimuli. We still lack a clear understanding of whether instruction-tuning is associated with IT-MLLMs organizing their representations around functional task demands or if they simply reflect surface semantics. To address this, we estimate brain alignment by predicting fMRI responses recorded during naturalistic movie watching (video with audio) from MLLM representations. Using instruction-specific embeddings from six video and two audio IT-MLLMs, across…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeurobiology of Language and Bilingualism · Multimodal Machine Learning Applications · Action Observation and Synchronization
MethodsALIGN
