Unveiling Deep Semantic Uncertainty Perception for Language-Anchored Multi-modal Vision-Brain Alignment

Zehui Feng; Chenqi Zhang; Mingru Wang; Minuo Wei; Shiwei Cheng; Cuntai Guan; Ting Han

arXiv:2511.04078·cs.CV·November 7, 2025

Unveiling Deep Semantic Uncertainty Perception for Language-Anchored Multi-modal Vision-Brain Alignment

Zehui Feng, Chenqi Zhang, Mingru Wang, Minuo Wei, Shiwei Cheng, Cuntai Guan, Ting Han

PDF

Open Access

TL;DR

This paper introduces Bratrix, an innovative end-to-end framework that aligns visual stimuli, neural signals, and language representations into a shared space, improving interpretability and robustness in neural-visual-linguistic tasks.

Contribution

Bratrix is the first framework to decouple visual and linguistic semantics for multimodal brain alignment, incorporating uncertainty modeling and a two-stage training strategy for enhanced performance.

Findings

01

Outperforms state-of-the-art in EEG, MEG, and fMRI tasks.

02

Surpasses 14.3% improvement in EEG retrieval accuracy.

03

Enhances neural-visual-linguistic alignment and interpretability.

Abstract

Unveiling visual semantics from neural signals such as EEG, MEG, and fMRI remains a fundamental challenge due to subject variability and the entangled nature of visual features. Existing approaches primarily align neural activity directly with visual embeddings, but visual-only representations often fail to capture latent semantic dimensions, limiting interpretability and deep robustness. To address these limitations, we propose Bratrix, the first end-to-end framework to achieve multimodal Language-Anchored Vision-Brain alignment. Bratrix decouples visual stimuli into hierarchical visual and linguistic semantic components, and projects both visual and brain representations into a shared latent space, enabling the formation of aligned visual-language and brain-language embeddings. To emulate human-like perceptual reliability and handle noisy neural signals, Bratrix incorporates a novel…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Face Recognition and Perception · EEG and Brain-Computer Interfaces