Mamba-SSM with LLM Reasoning for Feature Selection: Faithfulness-Aware Biomarker Discovery

Pushpa Kumar Balan; Aijing Feng

arXiv:2604.14334·q-bio.QM·April 20, 2026

Mamba-SSM with LLM Reasoning for Feature Selection: Faithfulness-Aware Biomarker Discovery

Pushpa Kumar Balan, Aijing Feng

PDF

TL;DR

This paper demonstrates that LLM chain-of-thought reasoning can effectively filter confounders in gene selection from deep models, improving biomarker discovery accuracy with fewer features.

Contribution

It introduces a method combining gradient saliency and LLM reasoning to enhance biomarker selection, showing improved performance and faithfulness in cancer gene discovery.

Findings

01

LLM reasoning improves AUC from 0.903 to 0.927 with fewer features.

02

6 out of 17 selected genes are validated BRCA biomarkers.

03

Targeted confounder removal enhances predictive performance without full recall.

Abstract

Gradient saliency from deep sequence models surfaces candidate biomarkers efficiently, but the resulting gene lists can be contaminated by tissue-composition confounders that degrade downstream classifiers. We study whether LLM chain-of-thought (CoT) reasoning can filter these confounders, and whether reasoning quality is associated with downstream performance. We train a Mamba SSM on TCGA-BRCA RNA-seq and extract the top-50 genes by gradient saliency; DeepSeek-R1 evaluates every candidate with structured CoT to produce a final 17-gene set. On the held-out test split, the raw 50-gene saliency set (no LLM) performs worse than a 5,000-gene variance baseline (AUC 0.832 vs. 0.903), while the LLM-filtered set surpasses it (AUC 0.927), using 294x fewer features. A faithfulness audit (COSMIC CGC, OncoKB, PAM50) shows that 6 of 17 selected genes (35.3%) are validated BRCA biomarkers, while 10…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.