StackFeat: a convergent algorithm for optimal predictor selection in genomic data
Akbar Yermekov, D.A. Herrera-Mart\'i

TL;DR
StackFeat is an iterative feature selection algorithm for high-dimensional genomic data that combines effect strength and selection frequency to identify stable, predictive biomarkers, demonstrated on COVID-19 miRNA data.
Contribution
The paper introduces StackFeat, a novel convergent algorithm that improves feature stability and biomarker discovery in genomic data by combining two selection criteria with convergence guarantees.
Findings
StackFeat identified a stable 5-miRNA signature with 98.5% feature reduction.
The signature achieved an AUC of 0.922, outperforming the benchmark set.
The method discovered known and novel biomarkers related to COVID-19.
Abstract
In high-dimensional genomic data, the curse of dimensionality (d >> n) and limited sampling make feature selection inherently unstable - a critical barrier to biomarker discovery. We introduce StackFeat, an iterative algorithm that accumulates two statistics across repeated cross-validation: signed coefficients (measuring effect strength and direction) and selection frequencies (estimating selection probability). Only features ranking highly by both criteria are retained. On a COVID-19 miRNA dataset (GSE240888), StackFeat identified a stable 5-miRNA signature from 332 features (98.5% reduction), achieving AUC 0.922, significantly outperforming the benchmark 9-gene set (AUC 0.907, p = 0.0016). The signature includes hsa-miR-150-5p, a marker implicated in both COVID-19 survival and Dengue infection. This dual-criterion approach provides convergence guarantees absent in single-criterion…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
