TL;DR
This paper introduces GLIM, a model that learns interpretable EEG representations to generate semantically faithful text, addressing hallucination issues and enabling robust evaluation in brain decoding.
Contribution
The paper proposes a novel EEG-to-text decoding framework that emphasizes semantic summarization and interpretability, improving reliability and evaluation methods in brain decoding.
Findings
GLIM generates fluent, EEG-grounded sentences without teacher forcing.
Supports EEG-text retrieval and zero-shot semantic classification.
Demonstrates robustness on the ZuCo dataset.
Abstract
Pretrained generative models have opened new frontiers in brain decoding by enabling the synthesis of realistic texts and images from non-invasive brain recordings. However, the reliability of such outputs remains questionable--whether they truly reflect semantic activation in the brain, or are merely hallucinated by the powerful generative models. In this paper, we focus on EEG-to-text decoding and address its hallucination issue through the lens of posterior collapse. Acknowledging the underlying mismatch in information capacity between EEG and text, we reframe the decoding task as semantic summarization of core meanings rather than previously verbatim reconstruction of stimulus texts. To this end, we propose the Generative Language Inspection Model (GLIM), which emphasizes learning informative and interpretable EEG representations to improve semantic grounding under heterogeneous and…
Peer Reviews
Decision·Submitted to ICLR 2026
1. The modular, plug-and-play architecture with minimal preprocessing enables scalability. 2. Original problem reframing tied to a principled failure mode (posterior collapse) with a concrete mitigation (Sec. 2–3). 3. The three-pronged evaluation (generation, retrieval, classification) provides much stronger validation than previous work relying solely on BLEU/ROUGE scores.
1. Main results appear single-run; no confidence intervals/seed variance for Table 1. Authors should report mean+CI over ≥3 seeds for all metrics, including controls. 2. The paper has limited technical novelty. Core components (Q-former-style alignment, contrastive learning, domain prompts) are borrowed from existing work. The contribution is primarily in their combination for this specific task. 3. Semantic evaluation (zero-shot classification) relies on pretrained LM priors; unclear whether im
1. The paper is exceptionally clear and well organized, making it easy to follow both the modeling approach and its neuroscientific motivation. It is a model example of strong writing and structure in the EEG decoding literature. 2. The shift in objective—from literal text reconstruction to capturing the core semantic content of EEG—is conceptually meaningful and addresses an important limitation in previous EEG-to-text work. The attempt to quantify “semantic faithfulness” through embedding-base
1. The methodological novelty is limited. The proposed “semantic subspace” and training objectives largely reuse existing alignment and generation strategies, and the paper introduces no fundamentally new algorithmic component. Its contribution lies primarily in combining these techniques into a coherent and well-presented EEG-to-text framework. 2. The distinction between literal decoding (“word-by-word reconstruction”) and semantic summarization is not fully explained. It is unclear how the mod
1. Novel problem framing: identifying posterior collapse as the root cause of hallucinations and recasting the task as semantic summarisation is original and interesting. 2. Rich ablation study: combining contrastive alignment, MTV data augmentation, and lightweight prompt adapters yields consistent gains in ablations. 3. Thorough self-diagnosis: the “noise-input” test and multi-view evaluation (generation, retrieval, zero-shot classification) demonstrate that the model actually listens to the E
1.Missing SOTA baselines. Only EEG2Text is reported. Please include recent systems (e.g., DeWave, STG-based decoders, contrastive/MAE models from ACL/NeurIPS 2023–24) under the same split, or justify non-applicability and adapt where feasible. 2.Single-corpus evidence. All results are on ZuCo. Add cross-corpus tests (e.g., Natural Stories, UCLA Harry-Potter, Belt-2, ChineseEEG) to support generalization. 3.EEG encoder underuses neural structure. Temporal cross-attention downsampling overlooks
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
