Generalizable and Scalable Multistage Biomedical Concept Normalization Leveraging Large Language Models
Nicholas J Dobbins

TL;DR
This study demonstrates that large language models, both proprietary and open-source, can significantly enhance biomedical concept normalization performance when integrated with existing rule-based systems without the need for fine-tuning.
Contribution
The paper introduces a two-step LLM integration approach that improves biomedical concept normalization by generating alternative phrasings and pruning candidate concepts, showing substantial performance gains.
Findings
GPT-3.5-turbo improves F1 by up to 10.9 points.
Open-source Vicuna model achieves up to 18.7 point increase in F1.
Large general-purpose LLMs can be effectively used without fine-tuning.
Abstract
Background: Biomedical entity normalization is critical to biomedical research because the richness of free-text clinical data, such as progress notes, can often be fully leveraged only after translating words and phrases into structured and coded representations suitable for analysis. Large Language Models (LLMs), in turn, have shown great potential and high performance in a variety of natural language processing (NLP) tasks, but their application for normalization remains understudied. Methods: We applied both proprietary and open-source LLMs in combination with several rule-based normalization systems commonly used in biomedical research. We used a two-step LLM integration approach, (1) using an LLM to generate alternative phrasings of a source utterance, and (2) to prune candidate UMLS concepts, using a variety of prompting methods. We measure results by , where we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Topic Modeling · Radiomics and Machine Learning in Medical Imaging
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Adam · Dropout · Dense Connections · Softmax · {Dispute@FaQ-s}How to file a dispute with Expedia? · Layer Normalization · Cosine Annealing
