Medical Concept Normalization in a Low-Resource Setting

Tim Patzelt

arXiv:2409.14579·cs.CL·September 24, 2024

Medical Concept Normalization in a Low-Resource Setting

Tim Patzelt

PDF

Open Access

TL;DR

This paper investigates medical concept normalization in low-resource German biomedical texts, demonstrating that multilingual Transformer models outperform string similarity methods, though contextual info did not improve results.

Contribution

It introduces a new German medical forum dataset and evaluates Transformer models, highlighting their advantages and limitations in low-resource medical NLP tasks.

Findings

01

Multilingual Transformers outperform string similarity methods.

02

Contextual information did not improve normalization accuracy.

03

Error analysis suggests areas for future improvement.

Abstract

In the field of biomedical natural language processing, medical concept normalization is a crucial task for accurately mapping mentions of concepts to a large knowledge base. However, this task becomes even more challenging in low-resource settings, where limited data and resources are available. In this thesis, I explore the challenges of medical concept normalization in a low-resource setting. Specifically, I investigate the shortcomings of current medical concept normalization methods applied to German lay texts. Since there is no suitable dataset available, a dataset consisting of posts from a German medical online forum is annotated with concepts from the Unified Medical Language System. The experiments demonstrate that multilingual Transformer-based models are able to outperform string similarity methods. The use of contextual information to improve the normalization of lay…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBiomedical Text Mining and Ontologies