# Rank-based learning: a novel high-throughput algorithm resilient to missing data and effective for datasets with small sample size

**Authors:** Lulu Song, Hamid Khoshfekr Rudsari, Johannes F Fahrmann, Jody Vykoukal, Sam Hanash, James P Long, Kim-Anh Do, Ehsan Irajizad

PMC · DOI: 10.1093/bib/bbaf666 · Briefings in Bioinformatics · 2025-12-12

## TL;DR

A new machine learning method called Rank-Based Learning improves diagnostic accuracy in omics data by focusing on feature rankings rather than absolute values.

## Contribution

The novel Rank-Based Learning algorithm enhances robustness and generalizability in high-throughput data with missing values and small sample sizes.

## Key findings

- RBL outperformed Logistic Regression and Random Forest in simulated data with batch effects and missing values.
- RBL achieved higher AUC scores in diagnosing small cell lung cancer and neuroendocrine tumors compared to existing methods.
- RBL's focus on relative feature rankings reduces the impact of non-biological variation in omics data.

## Abstract

High-throughput omics data present challenges for binary classification due to platform variability, batch effects, missing values, and high dimensionality. This study presents a novel Rank-Based Learning (RBL) method that leverages relative feature rankings to improve robustness and generalizability. We evaluated RBL against established methods like Logistic Regression (LR) and Random Forest (RF) using simulated data and two real-world plasma proteomics datasets: early-stage small cell lung cancer (SCLC) and duodenopancreatic neuroendocrine tumors (dpNET) in patients with Multiple Endocrine Neoplasia type 1 (MEN1). In simulation experiments, RBL outperformed LR under conditions involving batch effects, missing data, and varying numbers of true differential features. In SCLC, RBL yielded a test AUC of 0.76 (95% CI: 0.42–1.00), surpassing LR with Lasso (0.65 [95% CI: 0.47–0.84]) and RF with feature importance (0.59 [95% CI: 0.33–0.87]). In dpNET, RBL achieved an AUC of 0.83 (95% CI: 0.67–0.97) on the development set and 0.80 (95% CI: 0.54–0.98) on the test set, outperforming LR with Lasso (0.57 [95% CI: 0.40–0.77]) and RF with feature importance (0.53 [95% CI: 0.29–0.77]). By emphasizing feature ranking rather than absolute expression levels, RBL effectively mitigates the impact of non-biological variation. Overall, RBL improves the predictive accuracy of diagnostic models for complex diseases and provides a promising framework for developing more reliable and generalizable diagnostic tools from omics data, moving them closer to clinical application.

## Linked entities

- **Diseases:** small cell lung cancer (MONDO:0008433), Multiple Endocrine Neoplasia type 1 (MONDO:0007540)

## Full-text entities

- **Genes:** MEN1 (menin 1) [NCBI Gene 4221] {aka MEAI, SCG2}, IGFBP2 (insulin like growth factor binding protein 2) [NCBI Gene 3485] {aka IBP2, IGF-BP53}, TIMP1 (TIMP metallopeptidase inhibitor 1) [NCBI Gene 7076] {aka CLGI, EPA, EPO, HCI, TIMP, TIMP-1}, KSR2 (kinase suppressor of ras 2) [NCBI Gene 283455], TFRC (transferrin receptor) [NCBI Gene 7037] {aka CD71, IMD46, T9, TFR, TFR1, TR}, CHI3L1 (chitinase 3 like 1) [NCBI Gene 1116] {aka ASRT7, CGP-39, GP-39, GP39, HC-gp39, HCGP-3P}, FUT1 (fucosyltransferase 1 (H blood group)) [NCBI Gene 2523] {aka H, HH, HSC}, ACTB (actin beta) [NCBI Gene 60] {aka BKRNS, BNS, BRWS1, CSMH, DDS1, PS1TP5BP1}, COL18A1 (collagen type XVIII alpha 1 chain) [NCBI Gene 80781] {aka GLCC, KNO, KNO1, KS}, NCAM1 (neural cell adhesion molecule 1) [NCBI Gene 4684] {aka CD56, MSK39, NCAM}
- **Diseases:** Digestive and Kidney Diseases (MESH:D007674), breast cancer (MESH:D001943), gastric NET (MESH:D013272), thymoma (MESH:D013945), liver metastases (MESH:D009362), MEN1 (MESH:D018761), neuroendocrine carcinoma (MESH:D018278), RBL (MESH:D007859), dpNET (MESH:D018358), PDAC (MESH:D021441), SCLC (MESH:D055752), lung (MESH:D008171), Diabetes (MESH:D003920), Cancer (MESH:D009369), Lung Cancer (MESH:D008175)
- **Chemicals:** RBL (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12914468/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/PMC12914468/full.md

## References

45 references — full list in the complete paper: https://tomesphere.com/paper/PMC12914468/full.md

---
Source: https://tomesphere.com/paper/PMC12914468