Scalable Diffusion Transformer for Conditional 4D fMRI Synthesis
Jungwoo Seo, David Keetae Park, Shinjae Yoo, Jiook Cha

TL;DR
This paper introduces a scalable diffusion transformer model for generating 4D fMRI sequences conditioned on cognitive tasks, effectively capturing brain activation patterns and outperforming baseline models.
Contribution
It presents the first voxelwise 4D fMRI conditional generation model combining diffusion, transformer, and latent compression techniques with strong task conditioning.
Findings
Reproduces task-evoked activation maps with high correlation (0.83).
Preserves inter-task representational structure (RSA 0.98).
Outperforms U-Net baseline on all metrics.
Abstract
Generating whole-brain 4D fMRI sequences conditioned on cognitive tasks remains challenging due to the high-dimensional, heterogeneous BOLD dynamics across subjects/acquisitions and the lack of neuroscience-grounded validation. We introduce the first diffusion transformer for voxelwise 4D fMRI conditional generation, combining 3D VQ-GAN latent compression with a CNN-Transformer backbone and strong task conditioning via AdaLN-Zero and cross-attention. On HCP task fMRI, our model reproduces task-evoked activation maps, preserves the inter-task representational structure observed in real data (RSA), achieves perfect condition specificity, and aligns ROI time-courses with canonical hemodynamic responses. Performance improves predictably with scale, reaching task-evoked map correlation of 0.83 and RSA of 0.98, consistently surpassing a U-Net baseline on all metrics. By coupling latent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFunctional Brain Connectivity Studies · Generative Adversarial Networks and Image Synthesis · EEG and Brain-Computer Interfaces
