Brain alignment of reasoning and action representations from vision-language and action models during naturalistic gameplay
Subba Reddy Oota, Anant Khandelwal, Khushbu Pahwa, Satya Sai Srinath Namburi, Tanmoy Chakraborty, Bapi S. Raju, Manish Gupta

TL;DR
This study investigates how vision-language and large-action models align with human brain activity during naturalistic gameplay, revealing differences in their internal representations and cortical engagement.
Contribution
It demonstrates that action-focused and reasoning-focused prompts influence model-brain alignment differently across cortical regions, highlighting the impact of model specialization.
Findings
VLMs and LAMs outperform RL baselines in voxel-wise encoding.
Prompt-driven gains are largest in frontal-parietal and motor regions.
VLMs show prompt-symmetric organization; LAMs show prompt-asymmetric organization.
Abstract
Understanding how humans and artificial intelligence systems predict and plan by interacting with their environment is a fundamental challenge at the intersection of neuroscience and machine learning. Most brain-encoding studies focus on aligning artificial models with brain activity during language comprehension or passive visual processing, while interactive brain-alignment studies have to date been largely limited to reinforcement-learning (RL) agents and theory-based models. To address this gap, we study brain alignment of representative models from two foundation-model families, namely vision-language models (VLMs) and large-action models (LAMs), using fMRI recordings from participants playing naturalistic Atari-style video games. Specifically, we examine how action-focused and reasoning-focused prompts shape model's internal representations and align with fMRI brain activity.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
