How does longer temporal context enhance multimodal narrative video processing in the brain?
Prachi Jindal, Anant Khandelwal, Manish Gupta, Bapi S. Raju, Subba Reddy Oota, Tanmoy Chakraborty

TL;DR
This study explores how increasing the temporal context length of narrative videos enhances brain-model alignment, revealing different neural and model responses across timescales and tasks during movie watching.
Contribution
It demonstrates that longer video clips improve alignment with multimodal large language models and reveals a hierarchy of neural regions tuned to different temporal windows.
Findings
Longer clips improve brain alignment for multimodal models.
Shorter windows align with perceptual and early language regions.
Task prompts induce region-dependent, context-sensitive brain patterns.
Abstract
Understanding how humans and artificial intelligence systems process complex narrative videos is a fundamental challenge at the intersection of neuroscience and machine learning. This study investigates how the temporal context length of video clips (3--24 s clips) and the narrative-task prompting shape brain-model alignment during naturalistic movie watching. Using fMRI recordings from participants viewing full-length movies, we examine how brain regions sensitive to narrative context dynamically represent information over varying timescales and how these neural patterns align with model-derived features. We find that increasing clip duration substantially improves brain alignment for multimodal large language models (MLLMs), whereas unimodal video models show little to no gain. Further, shorter temporal windows align with perceptual and early language regions, while longer windows…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAction Observation and Synchronization · Multimodal Machine Learning Applications · Neurobiology of Language and Bilingualism
