MEG-XL: Data-Efficient Brain-to-Text via Long-Context Pre-Training

Dulhan Jayalath; Oiwi Parker Jones

arXiv:2602.02494·cs.LG·May 12, 2026

MEG-XL: Data-Efficient Brain-to-Text via Long-Context Pre-Training

Dulhan Jayalath, Oiwi Parker Jones

PDF

1 Repo 1 Models 1 Datasets

TL;DR

MEG-XL introduces a long-context pre-training approach for brain-to-text interfaces, significantly improving data efficiency and transferability in neural decoding tasks.

Contribution

The paper presents MEG-XL, a model pre-trained with 2.5 minutes of neural data per sample, enabling better generalization and performance with less training data.

Findings

01

MEG-XL matches supervised performance with only 1 hour of data compared to 50 hours.

02

Pre-training with longer neural context improves transfer to word decoding.

03

Long-context pre-training exploits extended neural information often discarded by other methods.

Abstract

Clinical brain-to-text interfaces are designed for paralysed patients who cannot provide extensive training recordings. Pre-training improves data-efficient generalisation by learning statistical priors across subjects, but these priors critically depend on context. While natural speech might unfold gradually over minutes, most methods pre-train with only a few seconds of context. Thus, we propose MEG-XL, a model pre-trained with 2.5 minutes of MEG context per sample, 5-300x longer than prior work, and equivalent to 191k tokens, capturing extended neural context. Fine-tuning on the task of word decoding from brain data, MEG-XL matches supervised performance with a fraction of the data (e.g. 1hr vs 50hrs) and outperforms brain foundation models. We find that models pre-trained with longer contexts learn representations that transfer better to word decoding. Our results indicate that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

neural-processing-lab/MEG-XL
github

Models

🤗
pnpl/MEG-XL
model· ♡ 1
♡ 1

Datasets

pnpl/LibriBrain
dataset· 1.5k dl
1.5k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.