An evaluation of GPT models for phenotype concept recognition
Tudor Groza, Harry Caufield, Dylan Gration, Gareth Baynam, Melissa A, Haendel, Peter N Robinson, Christopher J Mungall, Justin T Reese

TL;DR
This study evaluates GPT models for clinical phenotype recognition, demonstrating they can achieve state-of-the-art performance with appropriate prompts, but face challenges like non-determinism and high costs.
Contribution
It is the first comprehensive assessment of GPT models for phenotype annotation, showing their potential and limitations in clinical NLP tasks.
Findings
GPT models achieved up to 0.58 macro F1 on abstracts
GPT models surpassed existing tools on clinical observations
Performance depends on prompt design and setup
Abstract
Objective: Clinical deep phenotyping and phenotype annotation play a critical role in both the diagnosis of patients with rare disorders as well as in building computationally-tractable knowledge in the rare disorders field. These processes rely on using ontology concepts, often from the Human Phenotype Ontology, in conjunction with a phenotype concept recognition task (supported usually by machine learning methods) to curate patient profiles or existing scientific literature. With the significant shift in the use of large language models (LLMs) for most NLP tasks, we examine the performance of the latest Generative Pre-trained Transformer (GPT) models underpinning ChatGPT as a foundation for the tasks of clinical phenotyping and phenotype annotation. Materials and Methods: The experimental setup of the study included seven prompts of various levels of specificity, two GPT models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Artificial Intelligence in Healthcare and Education · Topic Modeling
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Cosine Annealing · Weight Decay · Linear Warmup With Cosine Annealing · Discriminative Fine-Tuning · Attention Dropout · Dense Connections · GPT
