From Minutes to Days: Scaling Intracranial Speech Decoding with Supervised Pretraining
Linnea Evanson, Mingfang Zhang, Hubert Banville, Saarang Panchavati, Pierre Bourdillon, Jean-R\'emi King

TL;DR
This paper presents a scalable intracranial speech decoding framework using extensive week-long recordings and supervised pretraining, significantly improving performance over traditional methods and addressing cross-day variability.
Contribution
Introduces a pretraining framework leveraging large-scale intracranial recordings, enhancing speech decoding accuracy and addressing day-to-day neural variability.
Findings
Pretraining with large datasets outperforms classic models.
Model performance scales log-linearly with dataset size.
Brain activity representations drift across days, requiring models to handle variability.
Abstract
Decoding speech from brain activity has typically relied on limited neural recordings collected during short and highly controlled experiments. Here, we introduce a framework to leverage week-long intracranial and audio recordings from patients undergoing clinical monitoring, effectively increasing the training dataset size by over two orders of magnitude. With this pretraining, our contrastive learning model substantially outperforms models trained solely on classic experimental data, with gains that scale log-linearly with dataset size. Analysis of the learned representations reveals that, while brain activity represents speech features, its global structure largely drifts across days, highlighting the need for models that explicitly account for cross-day variability. Overall, our approach opens a scalable path toward decoding and modeling brain representations in both real-life and…
Peer Reviews
Decision·Submitted to ICLR 2026
- Interesting idea to leverage ambient audio for a supervised pre-training stage - Fine-tuning the pre-trained model seems to convincingly beat the baseline - Error bars and statistical tests included show that improvements are significant - Performs appears to scale log-linearly with pre-training data between 0-100 hours
- Missing baselines: Please include (1) an end-to-end baseline where you train your full architecture directly on the supervised data and (2) a baseline where you train a linear layer directly on the raw iEEG of the downstream data. Without these, it’s hard to determine whether the pre-training was necessary at all. - Minor: Line 126-128: Özdogan et al. 2025 quotes some of the work from [A] so this should also be cited here. Similarly, line 441/442 discusses unsupervised models, for which you ma
1. Real-world relevance: The authors effectively leverage week-long clinical iEEG recordings paired with ambient audio—data typically discarded—to scale training data by over two orders of magnitude. This represents a meaningful step toward real-world, scalable brain-speech decoding and is clearly motivated and illustrated (Figure 1). 2. Rigorous and comprehensive experimental validation: The pretraining framework consistently improves downstream speech decoding across all three subjects, with
1. Limited comparison to recent state-of-the-art baselines: The paper does not adequately situate itself within the rapidly evolving literature on neural decoding. Key recent works—such as self-supervised pretraining on iEEG [1,2] and cross-subject or cross-session transfer learning [3]—are not discussed or compared. This omission weakens the claim of methodological novelty. 2. Incomplete coverage of pretraining innovations in brain decoding: While this paper emphasizes supervised pre-training
- It is a strength that large amounts of data (over the course of a week) can be effectively used, apparently scalably. It is hard to assess the "over two orders of magnitude" claim (L17), though. This also reveals one of the main insights, regarding the cross-day neural drift and the need to correct for it.
- It is only the most minor of complaints, but the format of the Introduction is not quite typical of a scientific publication. It is suggested to omit the boldfaced headings, or to add a more narrative opening. Some claims are mentioned 'loosely' (e.g., "patients...typically spend about a week", "about 100X more neural data") or without citation. The writing generally can be tightened up and improved. - Although references and related work are distributed throughout the paper, these tend to be
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeurobiology of Language and Bilingualism · Neural dynamics and brain function · Hearing Loss and Rehabilitation
