Automating clinical phenotyping using natural language processing

Linea Schmidt; Susanne Ibing; Florian Borchert; Julian Hugo; Allison A. Marshall; Jellyana Peraza; Judy H. Cho; Erwin P. Böttinger; Bernhard Y. Renard; Ryan C. Ungaro

PMC · DOI:10.1038/s43856-025-01337-0·January 14, 2026

Automating clinical phenotyping using natural language processing

Linea Schmidt, Susanne Ibing, Florian Borchert, Julian Hugo, Allison A. Marshall, Jellyana Peraza, Judy H. Cho, Erwin P. Böttinger, Bernhard Y. Renard, Ryan C. Ungaro

PDF

Open Access

TL;DR

This study compares rule-based NLP and GPT-4 for extracting Crohn’s disease features from clinical notes, showing high accuracy and potential to automate chart reviews.

Contribution

The first study to explore LLM-based phenotyping for Crohn’s sub-phenotypes using sentence-level datasets and direct comparison with rule-based methods.

Findings

01

GPT-4 achieved F1 scores of at least 0.90 for disease behavior and 0.82 for age at diagnosis at the note level.

02

Combining rule-based and LLM approaches improved precision and enabled prioritization of chart reviews.

03

Performance was comparable to human experts with no statistically significant difference.

Abstract

Real-world studies based on electronic health records often require manual chart review to derive patients’ clinical phenotypes, a labor-intensive task with limited scalability. Here, we developed and compared computable phenotyping based on rules using the spaCy framework and a Large Language Model (LLM), GPT-4, for sub-phenotyping of patients with Crohn’s disease, considering age at diagnosis and disease behavior. For our rule-based approach, we leveraged the spaCy framework and for the LLM-based approach, we used the GPT-4 model. The underlying data included 49,572 clinical notes and 2204 radiology reports from 584 Crohn’s disease patients. A test set of 280 clinical texts was labeled at sentence-level, in addition to patient-level ground truth data. The algorithms were evaluated based on their recall, precision, specificity values, and F1 scores. Overall, we observe similar or…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Species1

Homo sapiens(human · species)

Chemicals2

GPT-4 luminal

Diseases13

Crohn's disease LLM digestive diseases Strictures fistulas infection carotid stenosis CD IBD disease abscess inflammation Perianal disease

Figures3

Click any figure to enlarge with its caption.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare · Electronic Health Records Systems · Artificial Intelligence in Healthcare and Education