BrainVista: Modeling Naturalistic Brain Dynamics as Multimodal Next-Token Prediction
Xuanhua Yin, Runkai Zhao, Lina Yao, Weidong Cai

TL;DR
BrainVista is a novel multimodal autoregressive framework that models causal brain dynamics from sensory inputs, improving fMRI encoding and long-term prediction accuracy by capturing inter-network information flow and aligning stimuli with neural signals.
Contribution
It introduces Network-wise Tokenizers, a Spatial Mixer Head, and a Stimulus-to-Brain masking mechanism to enhance causal modeling of brain activity from multimodal data.
Findings
Achieved state-of-the-art fMRI encoding performance on multiple datasets.
Improved long-horizon pattern prediction correlation by over 33%.
Validated effectiveness across Algonauts 2025, CineBrain, and HAD datasets.
Abstract
Naturalistic fMRI characterizes the brain as a dynamic predictive engine driven by continuous sensory streams. However, modeling the causal forward evolution in realistic neural simulation is impeded by the timescale mismatch between multimodal inputs and the complex topology of cortical networks. To address these challenges, we introduce BrainVista, a multimodal autoregressive framework designed to model the causal evolution of brain states. BrainVista incorporates Network-wise Tokenizers to disentangle system-specific dynamics and a Spatial Mixer Head that captures inter-network information flow without compromising functional boundaries. Furthermore, we propose a novel Stimulus-to-Brain (S2B) masking mechanism to synchronize high-frequency sensory stimuli with hemodynamically filtered signals, enabling strict, history-only causal conditioning. We validate our framework on Algonauts…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFunctional Brain Connectivity Studies · EEG and Brain-Computer Interfaces · Generative Adversarial Networks and Image Synthesis
