Extracting Concepts for Precision Oncology from the Biomedical   Literature

Nicholas Greenspan; Yuqi Si; Kirk Roberts

arXiv:2010.00074·cs.AI·October 2, 2020·1 cites

Extracting Concepts for Precision Oncology from the Biomedical Literature

Nicholas Greenspan, Yuqi Si, Kirk Roberts

PDF

Open Access

TL;DR

This paper presents a dataset and NLP methods for extracting key concepts related to precision oncology from biomedical literature, aiming to improve information retrieval and analysis in this domain.

Contribution

It introduces a new annotated corpus and evaluates BERT-based models for concept extraction in precision oncology literature.

Findings

01

Best BERT model achieved 63.8% precision

02

Recall reached 71.9%, F1 score 67.1%

03

Proposes future research directions for system improvement

Abstract

This paper describes an initial dataset and automatic natural language processing (NLP) method for extracting concepts related to precision oncology from biomedical research articles. We extract five concept types: Cancer, Mutation, Population, Treatment, Outcome. A corpus of 250 biomedical abstracts were annotated with these concepts following standard double-annotation procedures. We then experiment with BERT-based models for concept extraction. The best-performing model achieved a precision of 63.8%, a recall of 71.9%, and an F1 of 67.1. Finally, we propose additional directions for research for improving extraction performance and utilizing the NLP system in downstream precision oncology applications.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBiomedical Text Mining and Ontologies · Topic Modeling · Natural Language Processing Techniques