Classification performance and reproducibility of GPT-4 omni for information extraction from veterinary electronic health records
Judit M Wulcan, Kevin L Jacques, Mary Ann Lee, Samantha L Kovacs,, Nicole Dausend, Lauren E Prince, Jonatan Wulcan, Sina Marsilio, Stefan M, Keller

TL;DR
This study evaluates GPT-4 omni's ability to extract clinical signs from veterinary EHRs, demonstrating high accuracy and reproducibility, outperforming GPT-3.5 Turbo and showing robustness across different settings.
Contribution
It provides a comprehensive comparison of GPT-4 omni and GPT-3.5 Turbo for veterinary EHR extraction, highlighting GPT-4 omni's superior performance and stability regardless of temperature adjustments.
Findings
GPT-4 omni achieved 96.9% sensitivity and 97.6% specificity.
GPT-4 omni outperformed GPT-3.5 Turbo, especially in sensitivity.
Reproducibility of GPT-4 omni was higher than human interobserver agreement.
Abstract
Large language models (LLMs) can extract information from veterinary electronic health records (EHRs), but performance differences between models, the effect of temperature settings, and the influence of text ambiguity have not been previously evaluated. This study addresses these gaps by comparing the performance of GPT-4 omni (GPT-4o) and GPT-3.5 Turbo under different conditions and investigating the relationship between human interobserver agreement and LLM errors. The LLMs and five humans were tasked with identifying six clinical signs associated with Feline chronic enteropathy in 250 EHRs from a veterinary referral hospital. At temperature 0, the performance of GPT-4o compared to the majority opinion of human respondents, achieved 96.9% sensitivity (interquartile range [IQR] 92.9-99.3%), 97.6% specificity (IQR 96.5-98.5%), 80.7% positive predictive value (IQR 70.8-84.6%), 99.5%…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTime Series Analysis and Forecasting
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Cosine Annealing · {Dispute@FaQ-s}How to file a dispute with Expedia? · Linear Layer · Weight Decay · Position-Wise Feed-Forward Layer · Label Smoothing · Linear Warmup With Cosine Annealing
