Gextext: Disease Network Extraction from Biomedical Literature

Robert O'Shea

arXiv:1911.02562·cs.DL·December 18, 2019·1 cites

Gextext: Disease Network Extraction from Biomedical Literature

Robert O'Shea

PDF

Open Access

TL;DR

Gextext is an unsupervised method that extracts latent disease networks from biomedical literature, capturing complex relationships without requiring labeled data or large training sets.

Contribution

It introduces a novel unsupervised approach that identifies disease similarities and constructs disease networks directly from unstructured biomedical texts.

Findings

01

Gextext's disease networks correlate with semantic and gene profile similarities.

02

It outperforms GloVE in extracting disease relationships.

03

The method captures more information than explicitly present in text.

Abstract

PURPOSE: We propose a fully unsupervised method to learn latent disease networks directly from unstructured biomedical text corpora. This method addresses current challenges in unsupervised knowledge extraction, such as the detection of long-range dependencies and requirements for large training corpora. METHODS: Let C be a corpus of n text chunks. Let V be a set of p disease terms occurring in the corpus. Let X indicate the occurrence of V in C. Gextext identifies disease similarities by positively correlated occurrence patterns. This information is combined to generate a graph on which geodesic distance describes dissimilarity. Diseasomes were learned by Gextext and GloVE on corpora of 100-1000 PubMed abstracts. Similarity matrix estimates were validated against biomedical semantic similarity metrics and gene profile similarity. RESULTS: Geodesic distance on Gextext-inferred…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBiomedical Text Mining and Ontologies · Topic Modeling · Advanced Text Analysis Techniques

MethodsGloVe Embeddings