A Multimodal Seq2Seq Transformer for Predicting Brain Responses to Naturalistic Stimuli

Qianyi He; Yuan Chang Leong

arXiv:2507.18104·cs.CV·July 28, 2025

A Multimodal Seq2Seq Transformer for Predicting Brain Responses to Naturalistic Stimuli

Qianyi He, Yuan Chang Leong

PDF

Open Access

TL;DR

This paper introduces a multimodal sequence-to-sequence Transformer model that predicts brain responses to naturalistic stimuli by integrating visual, auditory, and language features, capturing long-range temporal dependencies and individual variability.

Contribution

It presents a novel multimodal Transformer architecture with dual cross-attention and a shared encoder, improving brain response prediction across subjects and stimulus types.

Findings

01

Achieved strong performance on in-distribution data

02

Performed well on out-of-distribution data

03

Effectively modeled long-range temporal dependencies

Abstract

The Algonauts 2025 Challenge called on the community to develop encoding models that predict whole-brain fMRI responses to naturalistic multimodal movies. In this submission, we propose a sequence-to-sequence Transformer that autoregressively predicts fMRI activity from visual, auditory, and language inputs. Stimulus features were extracted using pretrained models including VideoMAE, HuBERT, Qwen, and BridgeTower. The decoder integrates information from prior brain states and current stimuli via dual cross-attention mechanisms that attend to both perceptual information extracted from the stimulus as well as narrative information provided by high-level summaries of the content. One core innovation of our approach is the use of sequences of multimodal context to predict sequences of brain activity, enabling the model to capture long-range temporal structure in both stimuli and neural…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEEG and Brain-Computer Interfaces · Functional Brain Connectivity Studies · Emotion and Mood Recognition