# Leveraging large language models for rare disease named entity recognition

**Authors:** Nan Miles Xi, Yu Deng, Lin Wang

PMC · DOI: 10.1371/journal.pdig.0001242 · PLOS Digital Health · 2026-02-12

## TL;DR

This study explores how large language models like GPT-4o can identify rare disease-related terms in medical text when labeled data is limited, showing promising results for research and clinical applications.

## Contribution

The study introduces structured prompting and semantically guided few-shot example selection methods to enhance rare disease NER with GPT-4o under low-resource settings.

## Key findings

- Task-level fine-tuning of GPT-4o outperforms BioClinicalBERT on the RareDis Corpus.
- Few-shot prompting provides high performance at low token costs for rare disease NER.
- Retrieval-augmented generation improves recall for challenging entity types like signs and symptoms.

## Abstract

Named Entity Recognition (NER) in the rare disease domain poses unique challenges due to limited labeled data, semantic ambiguity between entity types, and long-tail distributions. In this study, we evaluate the capabilities of GPT-4o for rare disease NER under low-resource settings, using a range of prompt-based strategies including zero-shot prompting, few-shot in-context learning, retrieval-augmented generation (RAG), and task-level fine-tuning. We design a structured prompting framework that encodes domain-specific knowledge and disambiguation rules for four entity types. We further introduce two semantically guided few-shot example selection methods to improve in-context performance while reducing labeling effort. Experiments on the RareDis Corpus show that GPT-4o achieves competitive or superior performance compared to BioClinicalBERT, with task-level fine-tuning yielding the strongest performance among the evaluated approaches and improving upon the previously reported BioClinicalBERT baseline. Cost-performance analysis reveals that few-shot prompting delivers high returns at low token budgets. RAG provides limited overall gains but can improve recall for challenging entity types, especially signs and symptoms. An error taxonomy highlights common failure modes such as boundary drift and type confusion, suggesting opportunities for post-processing and hybrid refinement. Our results demonstrate that prompt-optimized LLMs can serve as effective, scalable alternatives to traditional supervised models in biomedical NER, particularly in rare disease applications where annotated data is scarce.

Rare diseases are individually uncommon but together affect many patients. Clinicians often describe rare conditions, physical findings, and patient-reported symptoms in medical notes, which makes it hard to identify patients for research or follow-up. In this study, we ask whether modern large language models can pull these key terms from text when only limited labeled data are available. Using the public RareDis corpus, we evaluate several ways to use these models, including giving the model instructions alone, adding a small number of labeled examples, adding short background passages retrieved from a reference source, and further training a smaller model on the same task. We find that a few well selected examples markedly improve extraction of rare disease names at low cost, and further training achieves the best overall accuracy. Adding background passages provides limited average gains, but it sometimes can help capture more true mentions of harder categories such as signs and symptoms. Symptom extraction remains the most challenging because symptom labels are context dependent and can overlap with objective findings. These results support using large language models as decision-support tools paired with expert review to speed chart screening and rare disease research.

## Linked entities

- **Diseases:** rare diseases (MONDO:0021200)

## Full-text entities

- **Chemicals:** GPT-4o (-)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12900354/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12900354/full.md

## References

54 references — full list in the complete paper: https://tomesphere.com/paper/PMC12900354/full.md

---
Source: https://tomesphere.com/paper/PMC12900354