Multi-Modal Machine Learning for Population- and Subject-Specific lncRNA-Type 2 Diabetes Association Analysis
Ashwani Siwach, Sanjeev Narayan Sharma, Sunil Datt Sharma

TL;DR
This study develops a multi-modal machine learning framework integrating various lncRNA features to identify associations with Type 2 Diabetes, enhancing understanding and supporting precision medicine.
Contribution
The paper introduces a novel multi-feature ML approach combining expression, structure, and sequence data for lncRNA-T2D association analysis, validated across two cohorts.
Findings
GAS5 and XIST expression features linked to T2D in one cohort
MEG3 identified as dominant lncRNA across cohorts
ML results align with statistical methods and reveal molecular feature associations
Abstract
Long non-coding RNAs (lncRNAs) are emerging regulatory molecules implicated in chronic disease pathogenesis, including Type 2 Diabetes Mellitus (T2D). We investigated ten literature reported lncRNAs associated with T2D: MALAT1, MEG3, MIAT, ANRIL, GAS5, KCNQ1OT1, H19, BCYRN1, XIST, and HOTAIR across two independent population-based RNA-seq cohorts. Single-omics approaches provide an incomplete view of disease biology, therefore, an integrative multi-feature framework was developed, extracting expression, secondary-structure, and sequence features for each lncRNA. Eight machine learning (ML) classifiers were evaluated under stratified k-fold, leave-one-out cross-validation (LOOCV), and repeated hold-out schemes to ensure robust performance estimation. SHAP analysis was applied for subject-level association interpretation. In one cohort, GAS5 and XIST expression features, along with GAS5,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
