Semi-Automating Knowledge Base Construction for Cancer Genetics

Somin Wadhwa; Kanhua Yin; Kevin S. Hughes; Byron C. Wallace

arXiv:2005.08146·cs.CL·May 27, 2020

Semi-Automating Knowledge Base Construction for Cancer Genetics

Somin Wadhwa, Kanhua Yin, Kevin S. Hughes, Byron C. Wallace

PDF

Open Access

TL;DR

This paper develops models to automate the extraction of key information from full-text cancer genetics articles, aiming to accelerate the manual process of building a knowledge base for cancer risk assessment.

Contribution

It introduces two novel extraction tasks and a transformer-based joint model, demonstrating strong empirical results for automating knowledge base construction in cancer genetics.

Findings

01

Joint extraction model outperforms pipelined approaches

02

Strong empirical performance in extracting risk estimates

03

Potential to significantly expedite knowledge base creation

Abstract

In this work, we consider the exponentially growing subarea of genetics in cancer. The need to synthesize and centralize this evidence for dissemination has motivated a team of physicians to manually construct and maintain a knowledge base that distills key results reported in the literature. This is a laborious process that entails reading through full-text articles to understand the study design, assess study quality, and extract the reported cancer risk estimates associated with particular hereditary cancer genes (i.e., penetrance). In this work, we propose models to automatically surface key elements from full-text cancer genetics articles, with the ultimate aim of expediting the manual workflow currently in place. We propose two challenging tasks that are critical for characterizing the findings reported cancer genetics studies: (i) Extracting snippets of text that describe…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBiomedical Text Mining and Ontologies · Topic Modeling · Natural Language Processing Techniques