Learning Interpretable Representations Leads to Semantically Faithful EEG-to-Text Generation

Xiaozhao Liu; Dinggang Shen; Xihui Liu

arXiv:2505.17099·cs.CL·May 26, 2025

Learning Interpretable Representations Leads to Semantically Faithful EEG-to-Text Generation

Xiaozhao Liu, Dinggang Shen, Xihui Liu

PDF

3 Reviews

TL;DR

This paper introduces GLIM, a model that learns interpretable EEG representations to generate semantically faithful text, addressing hallucination issues and enabling robust evaluation in brain decoding.

Contribution

The paper proposes a novel EEG-to-text decoding framework that emphasizes semantic summarization and interpretability, improving reliability and evaluation methods in brain decoding.

Findings

01

GLIM generates fluent, EEG-grounded sentences without teacher forcing.

02

Supports EEG-text retrieval and zero-shot semantic classification.

03

Demonstrates robustness on the ZuCo dataset.

Abstract

Pretrained generative models have opened new frontiers in brain decoding by enabling the synthesis of realistic texts and images from non-invasive brain recordings. However, the reliability of such outputs remains questionable--whether they truly reflect semantic activation in the brain, or are merely hallucinated by the powerful generative models. In this paper, we focus on EEG-to-text decoding and address its hallucination issue through the lens of posterior collapse. Acknowledging the underlying mismatch in information capacity between EEG and text, we reframe the decoding task as semantic summarization of core meanings rather than previously verbatim reconstruction of stimulus texts. To this end, we propose the Generative Language Inspection Model (GLIM), which emphasizes learning informative and interpretable EEG representations to improve semantic grounding under heterogeneous and…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 6Confidence 4

Strengths

1. The modular, plug-and-play architecture with minimal preprocessing enables scalability. 2. Original problem reframing tied to a principled failure mode (posterior collapse) with a concrete mitigation (Sec. 2–3). 3. The three-pronged evaluation (generation, retrieval, classification) provides much stronger validation than previous work relying solely on BLEU/ROUGE scores.

Weaknesses

1. Main results appear single-run; no confidence intervals/seed variance for Table 1. Authors should report mean+CI over ≥3 seeds for all metrics, including controls. 2. The paper has limited technical novelty. Core components (Q-former-style alignment, contrastive learning, domain prompts) are borrowed from existing work. The contribution is primarily in their combination for this specific task. 3. Semantic evaluation (zero-shot classification) relies on pretrained LM priors; unclear whether im

Reviewer 02Rating 4Confidence 3

Strengths

1. The paper is exceptionally clear and well organized, making it easy to follow both the modeling approach and its neuroscientific motivation. It is a model example of strong writing and structure in the EEG decoding literature. 2. The shift in objective—from literal text reconstruction to capturing the core semantic content of EEG—is conceptually meaningful and addresses an important limitation in previous EEG-to-text work. The attempt to quantify “semantic faithfulness” through embedding-base

Weaknesses

1. The methodological novelty is limited. The proposed “semantic subspace” and training objectives largely reuse existing alignment and generation strategies, and the paper introduces no fundamentally new algorithmic component. Its contribution lies primarily in combining these techniques into a coherent and well-presented EEG-to-text framework. 2. The distinction between literal decoding (“word-by-word reconstruction”) and semantic summarization is not fully explained. It is unclear how the mod

Reviewer 03Rating 2Confidence 4

Strengths

1. Novel problem framing: identifying posterior collapse as the root cause of hallucinations and recasting the task as semantic summarisation is original and interesting. 2. Rich ablation study: combining contrastive alignment, MTV data augmentation, and lightweight prompt adapters yields consistent gains in ablations. 3. Thorough self-diagnosis: the “noise-input” test and multi-view evaluation (generation, retrieval, zero-shot classification) demonstrate that the model actually listens to the E

Weaknesses

1．Missing SOTA baselines. Only EEG2Text is reported. Please include recent systems (e.g., DeWave, STG-based decoders, contrastive/MAE models from ACL/NeurIPS 2023–24) under the same split, or justify non-applicability and adapt where feasible. 2．Single-corpus evidence. All results are on ZuCo. Add cross-corpus tests (e.g., Natural Stories, UCLA Harry-Potter, Belt-2, ChineseEEG) to support generalization. 3．EEG encoder underuses neural structure. Temporal cross-attention downsampling overlooks

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.