EEG-DLite: Dataset Distillation for Efficient Large EEG Model Training
Yuting Tang, Weibang Jiang, Shanglin Li, Yong Li, Chenyu Liu, Xinliang Zhou, Yi Ding, Cuntai Guan

TL;DR
EEG-DLite introduces a data distillation method that efficiently reduces EEG datasets by removing noise and redundancy, enabling effective large EEG model pre-training with significantly less data.
Contribution
This work presents the first systematic EEG data distillation framework that improves training efficiency without sacrificing model performance.
Findings
Training on 5% of the distilled dataset matches full dataset performance.
EEG-DLite reduces training data volume while maintaining diversity.
Distilled datasets lead to comparable or better downstream task results.
Abstract
Large-scale EEG foundation models have shown strong generalization across a range of downstream tasks, but their training remains resource-intensive due to the volume and variable quality of EEG data. In this work, we introduce EEG-DLite, a data distillation framework that enables more efficient pre-training by selectively removing noisy and redundant samples from large EEG datasets. EEG-DLite begins by encoding EEG segments into compact latent representations using a self-supervised autoencoder, allowing sample selection to be performed efficiently and with reduced sensitivity to noise. Based on these representations, EEG-DLite filters out outliers and minimizes redundancy, resulting in a smaller yet informative subset that retains the diversity essential for effective foundation model training. Through extensive experiments, we demonstrate that training on only 5 percent of a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsEEG and Brain-Computer Interfaces · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications
