Brain-Conditional Multimodal Synthesis: A Survey and Taxonomy
Weijian Mai, Jian Zhang, Pengfei Fang, Zhijun Zhang

TL;DR
This survey reviews the emerging field of brain-conditional multimodal synthesis, focusing on how brain signals can guide content generation across modalities, and provides a comprehensive taxonomy, datasets, models, and future directions.
Contribution
It offers the first comprehensive overview and taxonomy of brain-conditional multimodal synthesis, including datasets, models, evaluation methods, and future research directions.
Findings
Introduces a taxonomy for AIGC-Brain decoding models
Summarizes key datasets and brain regions involved
Discusses challenges and future prospects
Abstract
In the era of Artificial Intelligence Generated Content (AIGC), conditional multimodal synthesis technologies (e.g., text-to-image, text-to-video, text-to-audio, etc) are gradually reshaping the natural content in the real world. The key to multimodal synthesis technology is to establish the mapping relationship between different modalities. Brain signals, serving as potential reflections of how the brain interprets external information, exhibit a distinctive One-to-Many correspondence with various external modalities. This correspondence makes brain signals emerge as a promising guiding condition for multimodal content synthesis. Brian-conditional multimodal synthesis refers to decoding brain signals back to perceptual experience, which is crucial for developing practical brain-computer interface systems and unraveling complex mechanisms underlying how the brain perceives and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Robotics and Automated Systems · Speech and dialogue systems
