Leveraging Large Language Models for Rare Disease Named Entity Recognition

Nan Miles Xi; Yu Deng; Lin Wang

arXiv:2508.09323·cs.CL·December 30, 2025

Leveraging Large Language Models for Rare Disease Named Entity Recognition

Nan Miles Xi, Yu Deng, Lin Wang

PDF

Open Access

TL;DR

This paper evaluates GPT-4o's effectiveness in rare disease NER under low-resource conditions, demonstrating that prompt-based strategies and fine-tuning can outperform traditional models like BioClinicalBERT.

Contribution

It introduces a structured prompting framework and semantically guided example selection methods, advancing LLM application in rare disease NER with minimal labeled data.

Findings

01

GPT-4o achieves competitive or superior performance to BioClinicalBERT.

02

Task-level fine-tuning yields the best results among evaluated methods.

03

Few-shot prompting offers high cost-performance benefits.

Abstract

Named Entity Recognition (NER) in the rare disease domain poses unique challenges due to limited labeled data, semantic ambiguity between entity types, and long-tail distributions. In this study, we evaluate the capabilities of GPT-4o for rare disease NER under low-resource settings, using a range of prompt-based strategies including zero-shot prompting, few-shot in-context learning, retrieval-augmented generation (RAG), and task-level fine-tuning. We design a structured prompting framework that encodes domain-specific knowledge and disambiguation rules for four entity types. We further introduce two semantically guided few-shot example selection methods to improve in-context performance while reducing labeling effort. Experiments on the RareDis Corpus show that GPT-4o achieves competitive or superior performance compared to BioClinicalBERT, with task-level fine-tuning yielding the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Genomics and Rare Diseases · Biomedical Text Mining and Ontologies