Leveraging LLMs for Structured Data Extraction from Unstructured Patient Records
Mitchell A. Klusty, Elizabeth C. Solie, Caroline N. Leach, W. Vaiden Logan, Lynnet E. Richey, John C. Gensel, David P. Szczykutowicz, Bryan C. McLellan, Emily B. Collier, Samuel E. Armstrong, V.K. Cody Bumgardner

TL;DR
This paper introduces a secure, scalable framework using locally deployed large language models to automate extraction of structured data from unstructured clinical notes, aiming to reduce manual review effort and improve data consistency in clinical research.
Contribution
It presents a novel modular system integrating retrieval augmented generation with LLMs for accurate, secure, and scalable extraction of clinical features from unstructured EHR narratives.
Findings
Achieved high accuracy in extracting medical features from patient notes.
Identified annotation errors missed in manual review.
Demonstrated potential to reduce manual chart review workload.
Abstract
Manual chart review remains an extremely time-consuming and resource-intensive component of clinical research, requiring experts to extract often complex information from unstructured electronic health record (EHR) narratives. We present a secure, modular framework for automated structured feature extraction from clinical notes leveraging locally deployed large language models (LLMs) on institutionally approved, Health Insurance Portability and Accountability Act (HIPPA)-compliant compute infrastructure. This system integrates retrieval augmented generation (RAG) and structured response methods of LLMs into a widely deployable and scalable container to provide feature extraction for diverse clinical domains. In evaluation, the framework achieved high accuracy across multiple medical characteristics present in large bodies of patient notes when compared against an expert-annotated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsElectronic Health Records Systems · Machine Learning in Healthcare · Topic Modeling
