Multi-Modal Machine Learning for Population- and Subject-Specific lncRNA-Type 2 Diabetes Association Analysis

Ashwani Siwach; Sanjeev Narayan Sharma; Sunil Datt Sharma

arXiv:2605.20747·q-bio.GN·May 22, 2026

Multi-Modal Machine Learning for Population- and Subject-Specific lncRNA-Type 2 Diabetes Association Analysis

Ashwani Siwach, Sanjeev Narayan Sharma, Sunil Datt Sharma

PDF

TL;DR

This study develops a multi-modal machine learning framework integrating various lncRNA features to identify associations with Type 2 Diabetes, enhancing understanding and supporting precision medicine.

Contribution

The paper introduces a novel multi-feature ML approach combining expression, structure, and sequence data for lncRNA-T2D association analysis, validated across two cohorts.

Findings

01

GAS5 and XIST expression features linked to T2D in one cohort

02

MEG3 identified as dominant lncRNA across cohorts

03

ML results align with statistical methods and reveal molecular feature associations

Abstract

Long non-coding RNAs (lncRNAs) are emerging regulatory molecules implicated in chronic disease pathogenesis, including Type 2 Diabetes Mellitus (T2D). We investigated ten literature reported lncRNAs associated with T2D: MALAT1, MEG3, MIAT, ANRIL, GAS5, KCNQ1OT1, H19, BCYRN1, XIST, and HOTAIR across two independent population-based RNA-seq cohorts. Single-omics approaches provide an incomplete view of disease biology, therefore, an integrative multi-feature framework was developed, extracting expression, secondary-structure, and sequence features for each lncRNA. Eight machine learning (ML) classifiers were evaluated under stratified k-fold, leave-one-out cross-validation (LOOCV), and repeated hold-out schemes to ensure robust performance estimation. SHAP analysis was applied for subject-level association interpretation. In one cohort, GAS5 and XIST expression features, along with GAS5,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.