StackFeat: a convergent algorithm for optimal predictor selection in genomic data

Akbar Yermekov; D.A. Herrera-Mart\'i

arXiv:2604.22887·q-bio.OT·April 28, 2026

StackFeat: a convergent algorithm for optimal predictor selection in genomic data

Akbar Yermekov, D.A. Herrera-Mart\'i

PDF

TL;DR

StackFeat is an iterative feature selection algorithm for high-dimensional genomic data that combines effect strength and selection frequency to identify stable, predictive biomarkers, demonstrated on COVID-19 miRNA data.

Contribution

The paper introduces StackFeat, a novel convergent algorithm that improves feature stability and biomarker discovery in genomic data by combining two selection criteria with convergence guarantees.

Findings

01

StackFeat identified a stable 5-miRNA signature with 98.5% feature reduction.

02

The signature achieved an AUC of 0.922, outperforming the benchmark set.

03

The method discovered known and novel biomarkers related to COVID-19.

Abstract

In high-dimensional genomic data, the curse of dimensionality (d >> n) and limited sampling make feature selection inherently unstable - a critical barrier to biomarker discovery. We introduce StackFeat, an iterative algorithm that accumulates two statistics across repeated cross-validation: signed coefficients (measuring effect strength and direction) and selection frequencies (estimating selection probability). Only features ranking highly by both criteria are retained. On a COVID-19 miRNA dataset (GSE240888), StackFeat identified a stable 5-miRNA signature from 332 features (98.5% reduction), achieving AUC 0.922, significantly outperforming the benchmark 9-gene set (AUC 0.907, p = 0.0016). The signature includes hsa-miR-150-5p, a marker implicated in both COVID-19 survival and Dengue infection. This dual-criterion approach provides convergence guarantees absent in single-criterion…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.