Boosting Biomedical Concept Extraction by Rule-Based Data Augmentation
Qiwei Shao, Fengran Mo, Jian-Yun Nie

TL;DR
This paper enhances biomedical concept extraction by using rule-based data augmentation with MetaMapLite to generate pseudo-annotations, improving model training despite limited domain-specific data.
Contribution
It introduces a novel approach of leveraging a rule-based system for data augmentation to improve biomedical concept extraction models.
Findings
Augmented data improves extraction accuracy.
Rule-based augmentation outperforms baseline models.
Demonstrated effectiveness on PubMed and PMC datasets.
Abstract
Document-level biomedical concept extraction is the task of identifying biomedical concepts mentioned in a given document. Recent advancements have adapted pre-trained language models for this task. However, the scarcity of domain-specific data and the deviation of concepts from their canonical names often hinder these models' effectiveness. To tackle this issue, we employ MetaMapLite, an existing rule-based concept mapping system, to generate additional pseudo-annotated data from PubMed and PMC. The annotated data are used to augment the limited training data. Through extensive experiments, this study demonstrates the utility of a manually crafted concept mapping tool for training a better concept extraction model.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Text Analysis Techniques · Biomedical Text Mining and Ontologies
