Cross-Domain Data Integration for Named Entity Disambiguation in Biomedical Text
Maya Varma, Laurel Orr, Sen Wu, Megan Leszczynski, Xiao Ling,, Christopher R\'e

TL;DR
This paper introduces a cross-domain data integration approach that enhances biomedical named entity disambiguation by transferring structural knowledge from general text resources, significantly improving accuracy especially for rare entities.
Contribution
The work proposes a novel method for integrating structural knowledge from general to biomedical domains, creating a large dataset and achieving state-of-the-art results in medical NED.
Findings
Achieved state-of-the-art performance on MedMentions and BC5CDR datasets.
Improved disambiguation accuracy for rare entities by up to 57 points.
Generated a large biomedical NED dataset for pretraining.
Abstract
Named entity disambiguation (NED), which involves mapping textual mentions to structured entities, is particularly challenging in the medical domain due to the presence of rare entities. Existing approaches are limited by the presence of coarse-grained structural resources in biomedical knowledge bases as well as the use of training datasets that provide low coverage over uncommon resources. In this work, we address these issues by proposing a cross-domain data integration method that transfers structural knowledge from a general text knowledge base to the medical domain. We utilize our integration scheme to augment structural resources and generate a large biomedical NED dataset for pretraining. Our pretrained model with injected structural knowledge achieves state-of-the-art performance on two benchmark medical NED datasets: MedMentions and BC5CDR. Furthermore, we improve…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Topic Modeling · Natural Language Processing Techniques
