Who Fails Where? LLM and Human Error Patterns in Endometriosis Ultrasound Report Extraction
Haiyi Li, Yutong Li, Yiheng Chi, Alison Deslandes, Mathew Leonardi, Shay Freger, Yuan Zhang, Jodie Avery, M. Louise Hull, and Hsiang-Ting Chen

TL;DR
This study evaluates large-language models for converting unstructured ultrasound reports into structured data, highlighting their strengths, limitations, and the importance of human-AI collaboration in clinical workflows.
Contribution
It demonstrates the effectiveness of large-scale LLMs in clinical report structuring and emphasizes the complementary roles of AI and human experts in medical data extraction.
Findings
20B LLM achieved 86.02% accuracy
LLMs excel at syntactic consistency tasks
Humans outperform LLMs in semantic interpretation
Abstract
In this study, we evaluate a locally-deployed large-language model (LLM) to convert unstructured endometriosis transvaginal ultrasound (eTVUS) scan reports into structured data for imaging informatics workflows. Across 49 eTVUS reports, we compared three LLMs (7B/8B and a 20B-parameter model) against expert human extraction. The 20B model achieved a mean accuracy of 86.02%, substantially outperforming smaller models and confirming the importance of scale in handling complex clinical text. Crucially, we identified a highly complementary error profile: the LLM excelled at syntactic consistency (e.g., date/numeric formatting) where humans faltered, while human experts provided superior semantic and contextual interpretation. We also found that the LLM's semantic errors were fundamental limitations that could not be mitigated by simple prompt engineering. These findings strongly support a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Topic Modeling · Radiology practices and education
