EndoExtract: Co-Designing Structured Text Extraction from Endometriosis Ultrasound Reports
Haiyi Li, Yiyang Zhao, Yutong Li, Alison Deslandes, Jodie Avery, Mathew Leonardi, Mary Louise Hull, and Hsiang-Ting Chen

TL;DR
EndoExtract is a system that uses large language models to extract structured information from unstructured endometriosis ultrasound reports, improving workflow efficiency and reducing manual effort.
Contribution
This paper introduces EndoExtract, a novel LLM-powered system designed specifically for structured data extraction from medical reports, with a user-centered interface for clinical review.
Findings
Supports shift from manual data entry to supervisory validation
Highlights source evidence within PDFs for verification
Addresses workflow pain points identified through user research
Abstract
Endometriosis ultrasound reports are often unstructured free-text documents that require manual abstraction for downstream tasks such as analytics, machine learning model training, and clinical auditing. We present \textbf{EndoExtract}, an on-premise LLM-powered system that extracts structured data from these reports and surfaces interpretive fields for human review. Through contextual inquiry with research assistants, we identified key workflow pain points: asymmetric trust between numerical and interpretive fields, repetitive manual highlighting, fatigue from sustained comparison, and terminology inconsistency across radiologists. These findings informed an interface that surfaces only interpretive fields for mandatory review, automatically highlights source evidence within PDFs, and separates batch extraction from human-paced verification. A formative workshop revealed that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRadiology practices and education · Biomedical Text Mining and Ontologies · Topic Modeling
