Integrating Heterogeneous Gene Expression Data through Knowledge Graphs for Improving Diabetes Prediction
Rita T. Sousa, Heiko Paulheim

TL;DR
This paper introduces a novel method that combines multiple gene expression datasets and domain knowledge via knowledge graphs to enhance diabetes prediction accuracy.
Contribution
It presents a new approach integrating heterogeneous gene expression data with knowledge graphs and KG embeddings for improved disease prediction.
Findings
Enhanced diabetes prediction accuracy with integrated data
Effective use of knowledge graphs for biomedical data fusion
Improved classifier performance using KG-based features
Abstract
Diabetes is a worldwide health issue affecting millions of people. Machine learning methods have shown promising results in improving diabetes prediction, particularly through the analysis of diverse data types, namely gene expression data. While gene expression data can provide valuable insights, challenges arise from the fact that the sample sizes in expression datasets are usually limited, and the data from different datasets with different gene expressions cannot be easily combined. This work proposes a novel approach to address these challenges by integrating multiple gene expression datasets and domain-specific knowledge using knowledge graphs, a unique tool for biomedical data integration. KG embedding methods are then employed to generate vector representations, serving as inputs for a classifier. Experiments demonstrated the efficacy of our approach, revealing improvements in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBioinformatics and Genomic Networks · Gene expression and cancer classification · Machine Learning in Bioinformatics
