# Benchmarking Sparse Variable Selection Methods for Genomic Data Analyses

**Authors:** Hema Sri Sai Kollipara, Tapabrata Maiti, Sanjukta Chakraborty, Samiran Sinha

PMC · DOI: 10.1002/sim.70428 · Statistics in Medicine · 2026-02-10

## TL;DR

This paper compares different Bayesian methods for selecting important features in genomic data, finding that performance varies depending on the data structure.

## Contribution

The paper introduces and evaluates RFSFS, a new two-step procedure, and compares it with existing Bayesian methods for genomic feature selection.

## Key findings

- LASSO, SN, and RFSFS perform best for FDR and F-score with uncorrelated features.
- SN, SuSIE, and RFSFS are top performers for FDR with correlated features.
- LASSO outperforms SuSIE in F-score for correlated features.

## Abstract

Genomics and other studies encounter many features and a selection of essential features with high accuracy is desired. In recent years, there has been a significant advancement in the use of Bayesian inference for variable (or feature) selection. However, there needs to be more practical information regarding their implementation and assessment of their relative performance. Our goal in this paper is to perform a comparative analysis of approaches, mainly from different Bayesian genres that apply to genomic analysis. In particular, we are examining how well shrinkage, global–local, and mixture priors, SUSIE, and a simple two‐step procedure—namely, RFSFS, which we propose—perform in terms of several metrics: FDR, FNR, F‐score, and mean squared prediction error under various simulation scenarios. There is no single method that outperforms others uniformly across all scenarios and in terms of variable selection and prediction performance metrics. So, we order the methods based on the average ranking across different scenarios. We found LASSO, spike‐and‐slab prior with normal slab (SN), and RFSFS are the most competitive methods for FDR and F‐score when features are uncorrelated. When features are correlated, SN, SuSIE, and RFSFS are the most competitive methods for FDR whereas LASSO has an edge over SuSIE in terms of F‐score. For illustration, we have applied these methods to analyzed The Cancer Genome Atlas Program (TCGA) renal cell carcinoma (RCC) data and have offered methodological direction.

## Linked entities

- **Diseases:** renal cell carcinoma (MONDO:0005086)

## Full-text entities

- **Diseases:** RCC (MESH:D002292), Cancer (MESH:D009369)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12888550/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/PMC12888550/full.md

## References

56 references — full list in the complete paper: https://tomesphere.com/paper/PMC12888550/full.md

---
Source: https://tomesphere.com/paper/PMC12888550