Comprehensive Methodology for Sample Augmentation in EEG Biomarker Studies for Alzheimers Risk Classification
Veronica Henao Isaza, David Aguillon, Carlos Andres Tobon Quintero,, Francisco Lopera, John Fredy Ochoa Gomez

TL;DR
This study presents a comprehensive approach combining EEG data processing, harmonization, and propensity score matching to enhance sample size and improve Alzheimer's disease risk classification accuracy.
Contribution
It introduces an integrated methodology for EEG data harmonization and balancing to reliably classify AD risk with limited samples.
Findings
Sample balancing via propensity score matching improved classification accuracy to 0.92-0.96.
Harmonization reduced site effects while preserving key covariates.
The approach enables precise AD risk identification even with small datasets.
Abstract
Background: Dementia, marked by cognitive decline, is a global health challenge. Alzheimer's disease (AD), the leading type, accounts for ~70% of cases. Electroencephalography (EEG) measures show promise in identifying AD risk, but obtaining large samples for reliable comparisons is challenging. Objective: This study integrates signal processing, harmonization, and statistical techniques to enhance sample size and improve AD risk classification reliability. Methods: We used advanced EEG preprocessing, feature extraction, harmonization, and propensity score matching (PSM) to balance healthy non-carriers (HC) and asymptomatic E280A mutation carriers (ACr). Data from four databases were harmonized to adjust site effects while preserving covariates like age and sex. PSM ratios (2:1, 5:1, 10:1) were applied to assess sample size impact on model performance. The final dataset underwent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Bioinformatics
