Decoding the Multimodal Mind: Generalizable Brain-to-Text Translation via Multimodal Alignment and Adaptive Routing
Chunyu Ye, Yunhao Zhang, Jingyuan Sun, Chong Li, Chengqing Zong, Shaonan Wang

TL;DR
This paper introduces a novel multimodal brain-to-text translation framework that aligns brain signals with a shared semantic space using MLLMs, achieving state-of-the-art results across diverse stimuli and brain signal modalities.
Contribution
The work presents the first unified BCI architecture that decodes multimodal brain activity across various signals and stimuli using multimodal alignment and adaptive routing.
Findings
Achieved 8.48% improvement on benchmark datasets.
Demonstrated robustness across fMRI, EEG, and MEG data.
State-of-the-art performance in multimodal brain decoding.
Abstract
Decoding language from the human brain remains a grand challenge for Brain-Computer Interfaces (BCIs). Current approaches typically rely on unimodal brain representations, neglecting the brain's inherently multimodal processing. Inspired by the brain's associative mechanisms, where viewing an image can evoke related sounds and linguistic representations, we propose a unified framework that leverages Multimodal Large Language Models (MLLMs) to align brain signals with a shared semantic space encompassing text, images, and audio. A router module dynamically selects and fuses modality-specific brain features according to the characteristics of each stimulus. Experiments on various fMRI datasets with textual, visual, and auditory stimuli demonstrate state-of-the-art performance, achieving an 8.48% improvement on the most commonly used benchmark. We further extend our framework to EEG and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
