# An explainable-AI framework reveals novel lncRNAs specific for breast cancer subtypes

**Authors:** Jai Chand Patel, Avinash Veerappa, Chittibabu Guda

PMC · DOI: 10.3389/fbinf.2026.1760987 · 2026-03-10

## TL;DR

This study uses an explainable AI framework to identify lncRNAs specific to breast cancer subtypes, showing their potential for cancer subtyping and biomarker discovery.

## Contribution

The novel contribution is the systematic evaluation of lncRNA-only and integrative models for multi-class breast cancer subtyping using an explainable AI framework.

## Key findings

- XGBoost using lncRNAs alone achieved 89.2% accuracy in breast cancer subtyping.
- Explainable AI identified subtype-specific biomarker panels with unique lncRNA features for each subtype.
- Novel subtype-specific lncRNAs like CUFF.25255 and CUFF.26607 were found to correlate with survival outcomes.

## Abstract

Long non-coding RNAs (lncRNAs) have emerged as important regulators in cancer biology; yet their potential for cancer subtyping remains underexplored particularly in the context of large-scale, multi-class supervised classification frameworks, due to limited publicly available data or their use only as auxiliary features in classification tasks.

In this study, we utilized an expansive set of 7,177 lncRNAs obtained from 1,021 breast cancer (BRCA) transcriptomics datasets for subtyping using an explainable artificial intelligence (AI) framework. lncRNA, mRNA, and miRNA features were used to build machine learning (ML) models individually and in combination. Four ML classifiers: Naïve Bayes, Random Forest, Artificial Neural Network, and XGBoost were employed to evaluate subtype classification performance.

Using lncRNAs alone, XGBoost demonstrated strong performance with an accuracy of 89.2% and AUROC of 0.99. Addition of miRNA or mRNA features to lncRNA marginally improved the accuracy to 90.8% and 92.2%, respectively, while using all the three features together provided no further gain. A sequential key feature identification pipeline (ANOVA, Boruta, SHAP) has identified interpretable subtype-specific biomarker panels, yielding 119, 66, 54, and 24 unique features for Luminal A, Luminal B, HER2+, and Basal subtypes, respectively. Further lncRNA characterization followed by survival analysis revealed significant subtype-specific novel lncRNAs, including CUFF.25255 (LumA), CUFF.20237 and CUFF.3888 (LumB), CUFF.22414 (HER2+), and CUFF.26607 and CUFF.1961 (Basal).

Our findings highlight the diagnostic and biomarker discovery potential of lncRNAs, and the explainable-AI framework implemented here provides a systematic large-scale evaluation of lncRNA-only and integrative models for multi-class BRCA subtyping for BRCA subtyping and can be adopted to other cancers using the existing cancer transcriptomics data in the public databases.

## Linked entities

- **Diseases:** breast cancer (MONDO:0004989)

## Full-text entities

- **Genes:** TMEM43 (transmembrane protein 43) [NCBI Gene 79188] {aka ARVC5, ARVD5, AUNA3, EDMD7, EDMD7; AUNA2, LUMA}, ERBB2 (erb-b2 receptor tyrosine kinase 2) [NCBI Gene 2064] {aka CD340, HER-2, HER-2/neu, HER2, MLN 19, MLN-19}
- **Diseases:** cancer (MESH:D009369), BRCA (MESH:D001943)

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13008977/full.md

---
Source: https://tomesphere.com/paper/PMC13008977