EEG-DLite: Dataset Distillation for Efficient Large EEG Model Training

Yuting Tang; Weibang Jiang; Shanglin Li; Yong Li; Chenyu Liu; Xinliang Zhou; Yi Ding; Cuntai Guan

arXiv:2512.12210·cs.LG·January 27, 2026

EEG-DLite: Dataset Distillation for Efficient Large EEG Model Training

Yuting Tang, Weibang Jiang, Shanglin Li, Yong Li, Chenyu Liu, Xinliang Zhou, Yi Ding, Cuntai Guan

PDF

Open Access 1 Video

TL;DR

EEG-DLite introduces a data distillation method that efficiently reduces EEG datasets by removing noise and redundancy, enabling effective large EEG model pre-training with significantly less data.

Contribution

This work presents the first systematic EEG data distillation framework that improves training efficiency without sacrificing model performance.

Findings

01

Training on 5% of the distilled dataset matches full dataset performance.

02

EEG-DLite reduces training data volume while maintaining diversity.

03

Distilled datasets lead to comparable or better downstream task results.

Abstract

Large-scale EEG foundation models have shown strong generalization across a range of downstream tasks, but their training remains resource-intensive due to the volume and variable quality of EEG data. In this work, we introduce EEG-DLite, a data distillation framework that enables more efficient pre-training by selectively removing noisy and redundant samples from large EEG datasets. EEG-DLite begins by encoding EEG segments into compact latent representations using a self-supervised autoencoder, allowing sample selection to be performed efficiently and with reduced sensitivity to noise. Based on these representations, EEG-DLite filters out outliers and minimizes redundancy, resulting in a smaller yet informative subset that retains the diversity essential for effective foundation model training. Through extensive experiments, we demonstrate that training on only 5 percent of a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

EEG-DLite: Dataset Distillation for Efficient Large EEG Model Training· underline

Taxonomy

TopicsEEG and Brain-Computer Interfaces · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications