Boosting Biomedical Concept Extraction by Rule-Based Data Augmentation

Qiwei Shao; Fengran Mo; Jian-Yun Nie

arXiv:2407.02719·cs.CL·July 4, 2024

Boosting Biomedical Concept Extraction by Rule-Based Data Augmentation

Qiwei Shao, Fengran Mo, Jian-Yun Nie

PDF

Open Access

TL;DR

This paper enhances biomedical concept extraction by using rule-based data augmentation with MetaMapLite to generate pseudo-annotations, improving model training despite limited domain-specific data.

Contribution

It introduces a novel approach of leveraging a rule-based system for data augmentation to improve biomedical concept extraction models.

Findings

01

Augmented data improves extraction accuracy.

02

Rule-based augmentation outperforms baseline models.

03

Demonstrated effectiveness on PubMed and PMC datasets.

Abstract

Document-level biomedical concept extraction is the task of identifying biomedical concepts mentioned in a given document. Recent advancements have adapted pre-trained language models for this task. However, the scarcity of domain-specific data and the deviation of concepts from their canonical names often hinder these models' effectiveness. To tackle this issue, we employ MetaMapLite, an existing rule-based concept mapping system, to generate additional pseudo-annotated data from PubMed and PMC. The annotated data are used to augment the limited training data. Through extensive experiments, this study demonstrates the utility of a manually crafted concept mapping tool for training a better concept extraction model.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Text Analysis Techniques · Biomedical Text Mining and Ontologies