A Systematic Evaluation of Sample-Level Tokenization Strategies for MEG Foundation Models

SungJun Cho; Chetan Gohil; Rukuang Huang; Oiwi Parker Jones; and Mark W. Woolrich

arXiv:2602.16626·cs.LG·February 19, 2026

A Systematic Evaluation of Sample-Level Tokenization Strategies for MEG Foundation Models

SungJun Cho, Chetan Gohil, Rukuang Huang, Oiwi Parker Jones, and Mark W. Woolrich

PDF

Open Access

TL;DR

This study systematically compares different sample-level tokenization strategies for transformer-based neuroimaging models applied to MEG data, finding that simple fixed schemes perform comparably to learnable methods across various evaluation metrics.

Contribution

It introduces a comprehensive evaluation of tokenization methods for neuroimaging data, including a novel autoencoder-based learnable tokenizer, and demonstrates the effectiveness of simple fixed strategies.

Findings

01

Both learnable and fixed tokenizers achieve high reconstruction fidelity.

02

Simple fixed tokenization performs comparably to learnable methods on multiple metrics.

03

Results suggest fixed strategies are sufficient for neural foundation models.

Abstract

Recent success in natural language processing has motivated growing interest in large-scale foundation models for neuroimaging data. Such models often require discretization of continuous neural time series data, a process referred to as 'tokenization'. However, the impact of different tokenization strategies for neural data is currently poorly understood. In this work, we present a systematic evaluation of sample-level tokenization strategies for transformer-based large neuroimaging models (LNMs) applied to magnetoencephalography (MEG) data. We compare learnable and non-learnable tokenizers by examining their signal reconstruction fidelity and their impact on subsequent foundation modeling performance (token prediction, biological plausibility of generated data, preservation of subject-specific information, and performance on downstream tasks). For the learnable tokenizer, we introduce…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFunctional Brain Connectivity Studies · EEG and Brain-Computer Interfaces · Machine Learning in Healthcare