Text Mining to Identify and Extract Novel Disease Treatments From Unstructured Datasets
Rahul Yedida, Saad Mohammad Abrar, Cleber Melo-Filho, Eugene Muratov,, Rada Chirkova, Alexander Tropsha

TL;DR
This paper presents a pipeline that extracts potential drug-disease treatment pairs from unstructured spoken text, using language models and validation against medical knowledge graphs, demonstrating promising results in identifying novel treatments.
Contribution
The study introduces a modular, adaptable pipeline leveraging pre-trained language models and validation tools to identify potential disease treatments from unstructured spoken text sources.
Findings
30.4% of proposed pairs found in ROBOKOP database
Successfully identified Omeprazole as a treatment for heartburn
Pipeline is adaptable to various unstructured text sources
Abstract
Objective: We aim to learn potential novel cures for diseases from unstructured text sources. More specifically, we seek to extract drug-disease pairs of potential cures to diseases by a simple reasoning over the structure of spoken text. Materials and Methods: We use Google Cloud to transcribe podcast episodes of an NPR radio show. We then build a pipeline for systematically pre-processing the text to ensure quality input to the core classification model, which feeds to a series of post-processing steps for obtaining filtered results. Our classification model itself uses a language model pre-trained on PubMed text. The modular nature of our pipeline allows for ease of future developments in this area by substituting higher quality components at each stage of the pipeline. As a validation measure, we use ROBOKOP, an engine over a medical knowledge graph with only validated pathways,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Biomedical Text Mining and Ontologies · Natural Language Processing Techniques
