Semi-Automating Knowledge Base Construction for Cancer Genetics
Somin Wadhwa, Kanhua Yin, Kevin S. Hughes, Byron C. Wallace

TL;DR
This paper develops models to automate the extraction of key information from full-text cancer genetics articles, aiming to accelerate the manual process of building a knowledge base for cancer risk assessment.
Contribution
It introduces two novel extraction tasks and a transformer-based joint model, demonstrating strong empirical results for automating knowledge base construction in cancer genetics.
Findings
Joint extraction model outperforms pipelined approaches
Strong empirical performance in extracting risk estimates
Potential to significantly expedite knowledge base creation
Abstract
In this work, we consider the exponentially growing subarea of genetics in cancer. The need to synthesize and centralize this evidence for dissemination has motivated a team of physicians to manually construct and maintain a knowledge base that distills key results reported in the literature. This is a laborious process that entails reading through full-text articles to understand the study design, assess study quality, and extract the reported cancer risk estimates associated with particular hereditary cancer genes (i.e., penetrance). In this work, we propose models to automatically surface key elements from full-text cancer genetics articles, with the ultimate aim of expediting the manual workflow currently in place. We propose two challenging tasks that are critical for characterizing the findings reported cancer genetics studies: (i) Extracting snippets of text that describe…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Topic Modeling · Natural Language Processing Techniques
