# Comprehensive methodology for sample enrichment in EEG biomarker studies for Alzheimer’s risk classification

**Authors:** Verónica Henao Isaza, David Aguillon, Carlos Andrés Tobón-Quintero, Francisco Lopera, John Fredy Ochoa-Gómez, Diego A. Forero, Diego A. Forero, Diego A. Forero

PMC · DOI: 10.1371/journal.pone.0343722 · 2026-03-11

## TL;DR

This paper introduces a new framework to improve EEG biomarker studies for Alzheimer’s risk by harmonizing data and using statistical balancing techniques.

## Contribution

A novel EEG sample enrichment framework combining data harmonization and Propensity Score Matching to enhance Alzheimer’s risk classification.

## Key findings

- Sample enrichment via PSM improved classification accuracy with decision tree models achieving 0.91–0.96 accuracy.
- Higher enrichment ratios increased model stability and generalizability as shown by learning curves and confusion matrices.

## Abstract

Dementia, particularly Alzheimer’s disease (AD), constitutes a major global health concern, with AD accounting for approximately 70% of all cases. EEG-based biomarkers hold promise for early identification of individuals at risk; however, small and heterogeneous samples frequently limit generalizability.

An EEG-based sample enrichment framework was developed by integrating advanced signal processing, component-level feature extraction, data harmonization (neuroHarmonize), and Propensity Score Matching (PSM). EEG data from four independent cohorts were harmonized to reduce site-related variability while preserving covariates such as age and sex. Features including power, entropy, coherence, synchronization likelihood, and cross-frequency coupling were extracted from independent components. PSM was applied at 2:1, 5:1, and 10:1 ratios to expand and balance the control group (HC) relative to the Alzheimer’s risk group (ACr), composed of PSEN1-E280A mutation carriers without cognitive symptoms.

Sample enrichment through PSM improved classification accuracy, with decision tree models yielding values between 0.91 and 0.96. Higher enrichment ratios enhanced model stability and generalizability, as shown by learning curves and confusion matrices. Feature selection was based on model performance and effect sizes (Cohen’s d).

The proposed framework addresses sample size and variability constraints in EEG-based AD risk classification.

Harmonization and statistical balancing provide a replicable strategy for multicenter EEG studies targeting early AD detection.

## Linked entities

- **Diseases:** Alzheimer’s disease (MONDO:0004975), dementia (MONDO:0001627)

## Full-text entities

- **Genes:** MAPT (microtubule associated protein tau) [NCBI Gene 4137] {aka DDPAC, FTD1, FTDP-17, MAPTL, MSTD, MTBT1}, APP (amyloid beta precursor protein) [NCBI Gene 351] {aka AAA, ABETA, ABPP, AD1, APPI, CTFgamma}, PSEN1 (presenilin 1) [NCBI Gene 5663] {aka ACNINV3, AD3, CMD1U, FAD, PS-1, PS1}
- **Diseases:** HC (MESH:D014717), functions (MESH:D003291), cognitive symptoms (MESH:D019954), neurological diseases (MESH:D020271), neurofibrillary (MESH:D055956), Parkinson's disease (MESH:D010300), neurodegeneration (MESH:D019636), ORCID iD (MESH:C535742), ACr (MESH:D000544), bipolar disorder (MESH:D001714), Dementia (MESH:D003704), depression (MESH:D003866), cognitive decline (MESH:D003072), deficits in memory, (MESH:D008569)
- **Chemicals:** Forero (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]
- **Mutations:** E280A

## Figures

11 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12978488/full.md

---
Source: https://tomesphere.com/paper/PMC12978488