Gextext: Disease Network Extraction from Biomedical Literature
Robert O'Shea

TL;DR
Gextext is an unsupervised method that extracts latent disease networks from biomedical literature, capturing complex relationships without requiring labeled data or large training sets.
Contribution
It introduces a novel unsupervised approach that identifies disease similarities and constructs disease networks directly from unstructured biomedical texts.
Findings
Gextext's disease networks correlate with semantic and gene profile similarities.
It outperforms GloVE in extracting disease relationships.
The method captures more information than explicitly present in text.
Abstract
PURPOSE: We propose a fully unsupervised method to learn latent disease networks directly from unstructured biomedical text corpora. This method addresses current challenges in unsupervised knowledge extraction, such as the detection of long-range dependencies and requirements for large training corpora. METHODS: Let C be a corpus of n text chunks. Let V be a set of p disease terms occurring in the corpus. Let X indicate the occurrence of V in C. Gextext identifies disease similarities by positively correlated occurrence patterns. This information is combined to generate a graph on which geodesic distance describes dissimilarity. Diseasomes were learned by Gextext and GloVE on corpora of 100-1000 PubMed abstracts. Similarity matrix estimates were validated against biomedical semantic similarity metrics and gene profile similarity. RESULTS: Geodesic distance on Gextext-inferred…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Topic Modeling · Advanced Text Analysis Techniques
MethodsGloVe Embeddings
