Ensemble feature selection with data-driven thresholding for Alzheimer's disease biomarker discovery
Annette Spooner, Gelareh Mohammadi, Perminder S. Sachdev, Henry, Brodaty, Arcot Sowmya (for the Sydney Memory, Ageing Study, the, Alzheimer's Disease Neuroimaging Initiative)

TL;DR
This paper introduces data-driven thresholding methods for ensemble feature selection to improve stability and relevance of biomarkers in high-dimensional Alzheimer's disease datasets, aiding early diagnosis.
Contribution
It develops novel automatic thresholds for ensemble feature selection, enhancing stability and relevance in clinical biomarker discovery for Alzheimer's disease.
Findings
Methods improve feature selection stability and accuracy.
Features reflect current Alzheimer's disease biomarkers.
Validated on real-world clinical datasets.
Abstract
Healthcare datasets present many challenges to both machine learning and statistics as their data are typically heterogeneous, censored, high-dimensional and have missing information. Feature selection is often used to identify the important features but can produce unstable results when applied to high-dimensional data, selecting a different set of features on each iteration. The stability of feature selection can be improved with the use of feature selection ensembles, which aggregate the results of multiple base feature selectors. A threshold must be applied to the final aggregated feature set to separate the relevant features from the redundant ones. A fixed threshold, which is typically applied, offers no guarantee that the final set of selected features contains only relevant features. This work develops several data-driven thresholds to automatically identify the relevant…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Artificial Intelligence in Healthcare · Bioinformatics and Genomic Networks
MethodsFeature Selection · Balanced Selection
