Evaluating ChatGPT text-mining of clinical records for obesity   monitoring

Ivo S. Fins (1); Heather Davies (1); Sean Farrell (2); Jose R.Torres; (3); Gina Pinchbeck (1); Alan D. Radford (1); Peter-John Noble (1) ((1) Small; Animal Veterinary Surveillance Network; Institute of Infection; Veterinary; and Ecological Sciences; University of Liverpool; Liverpool; UK; (2); Department of Computer Science; Durham University; Durham; UK; (3) Institute; for Animal Health; Food Safety; University of Las Palmas de Gran Canaria,; Las Palmas; Canary Archipelago; Spain)

arXiv:2308.01666·cs.IR·August 4, 2023·1 cites

Evaluating ChatGPT text-mining of clinical records for obesity monitoring

Ivo S. Fins (1), Heather Davies (1), Sean Farrell (2), Jose R.Torres, (3), Gina Pinchbeck (1), Alan D. Radford (1), Peter-John Noble (1) ((1) Small, Animal Veterinary Surveillance Network, Institute of Infection, Veterinary, and Ecological Sciences, University of Liverpool

PDF

Open Access

TL;DR

This study compares ChatGPT and regex-based methods for extracting obesity scores from veterinary clinical narratives, highlighting ChatGPT's higher recall but lower precision, and discusses the potential and limitations of large language models in clinical data extraction.

Contribution

It demonstrates the application of ChatGPT for extracting clinical information from veterinary narratives and compares its performance with traditional regex methods.

Findings

01

ChatGPT achieved higher recall (100%) than regex (72.6%).

02

Regex had higher precision (100%) compared to ChatGPT (89.3%).

03

Prompt engineering is crucial for improving ChatGPT's output accuracy.

Abstract

Background: Veterinary clinical narratives remain a largely untapped resource for addressing complex diseases. Here we compare the ability of a large language model (ChatGPT) and a previously developed regular expression (RegexT) to identify overweight body condition scores (BCS) in veterinary narratives. Methods: BCS values were extracted from 4,415 anonymised clinical narratives using either RegexT or by appending the narrative to a prompt sent to ChatGPT coercing the model to return the BCS information. Data were manually reviewed for comparison. Results: The precision of RegexT was higher (100%, 95% CI 94.81-100%) than the ChatGPT (89.3%; 95% CI82.75-93.64%). However, the recall of ChatGPT (100%. 95% CI 96.18-100%) was considerably higher than that of RegexT (72.6%, 95% CI 63.92-79.94%). Limitations: Subtle prompt engineering is needed to improve ChatGPT output. Conclusions: Large…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Topic Modeling