Improving Large Language Models for Clinical Named Entity Recognition via Prompt Engineering
Yan Hu, Qingyu Chen, Jingcheng Du, Xueqing Peng, Vipina Kuttichi, Keloth, Xu Zuo, Yujia Zhou, Zehan Li, Xiaoqian Jiang, Zhiyong Lu, Kirk, Roberts, Hua Xu

TL;DR
This study evaluates GPT-3.5 and GPT-4 for clinical named entity recognition and introduces a prompt engineering framework that significantly improves their performance, making them more feasible for clinical applications despite not surpassing specialized models.
Contribution
The paper presents a novel prompt engineering framework tailored for clinical NER tasks that enhances GPT models' performance with minimal training data.
Findings
Prompt components improve GPT performance across tasks
GPT models outperform baseline prompts with all components
BioClinicalBERT still outperforms GPT models in accuracy
Abstract
Objective: This study quantifies the capabilities of GPT-3.5 and GPT-4 for clinical named entity recognition (NER) tasks and proposes task-specific prompts to improve their performance. Materials and Methods: We evaluated these models on two clinical NER tasks: (1) to extract medical problems, treatments, and tests from clinical notes in the MTSamples corpus, following the 2010 i2b2 concept extraction shared task, and (2) identifying nervous system disorder-related adverse events from safety reports in the vaccine adverse event reporting system (VAERS). To improve the GPT models' performance, we developed a clinical task-specific prompt framework that includes (1) baseline prompts with task description and format specification, (2) annotation guideline-based prompts, (3) error analysis-based instructions, and (4) annotated samples for few-shot learning. We assessed each prompt's…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Machine Learning in Healthcare · Artificial Intelligence in Healthcare and Education
Methods{Dispute@FaQ-s}How to file a dispute with Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Multi-Head Attention · Attention Is All You Need · Discriminative Fine-Tuning · Label Smoothing · Position-Wise Feed-Forward Layer · Absolute Position Encodings · GPT · Transformer
