Quantity versus Diversity: Influence of Data on Detecting EEG Pathology with Advanced ML Models
Martyna Poziomska, Marian Dovgialo, Przemys{\l}aw Olbratowski,, Pawe{\l} Niedbalski, Pawe{\l} Ogniewski, Joanna Zych, Jacek Rogala,, Jaros{\l}aw \.Zygierewicz

TL;DR
This paper examines how data quantity and diversity affect EEG pathology detection using advanced machine learning models, highlighting that larger datasets improve accuracy and can offset diversity challenges, especially with attention-based neural networks.
Contribution
Introduces the Elmiko dataset, the largest publicly available EEG corpus, and analyzes the effects of data size and diversity on model performance for EEG pathology detection.
Findings
Larger datasets improve predictive accuracy.
Data diversity impacts model performance significantly.
Attention-based neural networks benefit from increased data volume.
Abstract
This study investigates the impact of quantity and diversity of data on the performance of various machine-learning models for detecting general EEG pathology. We utilized an EEG dataset of 2,993 recordings from Temple University Hospital and a dataset of 55,787 recordings from Elmiko Biosignals sp. z o.o. The latter contains data from 39 hospitals and a diverse patient set with varied conditions. Thus, we introduce the Elmiko dataset - the largest publicly available EEG corpus. Our findings show that small and consistent datasets enable a wide range of models to achieve high accuracy; however, variations in pathological conditions, recording protocols, and labeling standards lead to significant performance degradation. Nonetheless, increasing the number of available recordings improves predictive accuracy and may even compensate for data diversity, particularly in neural networks based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBrain Tumor Detection and Classification · Machine Learning in Healthcare
MethodsSoftmax · Attention Is All You Need · Sparse Evolutionary Training
