Predicting Disease-Gene Associations using Cross-Document Graph-based Features
Hendrik ter Horst, Matthias Hartung, Roman Klinger, Matthias Zwick,, Philipp Cimiano

TL;DR
This paper introduces a graph-based machine learning method leveraging cross-document features to improve the accuracy of predicting disease-gene associations from text, surpassing simple co-occurrence approaches.
Contribution
It presents a novel RDF graph representation of disease-gene and gene-gene interactions combined with a classifier to distinguish valid associations from spurious ones.
Findings
30-point F1 score improvement over baseline
Effective filtering of spurious associations
Enhanced detection of relevant disease-gene links
Abstract
In the context of personalized medicine, text mining methods pose an interesting option for identifying disease-gene associations, as they can be used to generate novel links between diseases and genes which may complement knowledge from structured databases. The most straightforward approach to extract such links from text is to rely on a simple assumption postulating an association between all genes and diseases that co-occur within the same document. However, this approach (i) tends to yield a number of spurious associations, (ii) does not capture different relevant types of associations, and (iii) is incapable of aggregating knowledge that is spread across documents. Thus, we propose an approach in which disease-gene co-occurrences and gene-gene interactions are represented in an RDF graph. A machine learning-based classifier is trained that incorporates features extracted from the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Bioinformatics and Genomic Networks · Machine Learning in Bioinformatics
