Applying BioBERT to Extract Germline Gene-Disease Associations for Building a Knowledge Graph from the Biomedical Literature
Armando D. Diaz Gonzalez, Kevin S. Hughes, Songhui Yue, Sean T. Hayes

TL;DR
This paper introduces SimpleGermKG, an automated method using BioBERT to extract and connect germline genes and diseases from biomedical literature, creating a comprehensive knowledge graph for biomedical research.
Contribution
It presents a novel approach combining BioBERT with ontology and rule-based algorithms to construct a germline gene-disease knowledge graph from literature.
Findings
Knowledge graph contains 297 genes, 130 diseases, and 46,747 triples.
Effective visualization of gene-disease relationships achieved.
Highlights limitations and future challenges in germline knowledge extraction.
Abstract
Published biomedical information has and continues to rapidly increase. The recent advancements in Natural Language Processing (NLP), have generated considerable interest in automating the extraction, normalization, and representation of biomedical knowledge about entities such as genes and diseases. Our study analyzes germline abstracts in the construction of knowledge graphs of the of the immense work that has been done in this area for genes and diseases. This paper presents SimpleGermKG, an automatic knowledge graph construction approach that connects germline genes and diseases. For the extraction of genes and diseases, we employ BioBERT, a pre-trained BERT model on biomedical corpora. We propose an ontology-based and rule-based algorithm to standardize and disambiguate medical terms. For semantic relationships between articles, genes, and diseases, we implemented a part-whole…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Topic Modeling · Bioinformatics and Genomic Networks
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Attention Dropout · Dropout · Refunds@Expedia|||How do I get a full refund from Expedia? · Weight Decay · Adam · Linear Warmup With Linear Decay · Layer Normalization
