Customizing Open Source LLMs for Quantitative Medication Attribute Extraction across Heterogeneous EHR Systems
Zhe Fei, Mehmet Yigit Turali, Shreyas Rajesh, Xinyang Dai, Huyen Pham, Pavan Holur, Yuhui Zhu, Larissa Mooney, Yih-Ing Hser, Vwani Roychowdhury

TL;DR
This paper presents a framework that customizes open source large language models to extract standardized medication attributes from heterogeneous EHR data, enabling consistent analysis of opioid use disorder treatments across multiple clinics.
Contribution
The study introduces a practical pipeline that adapts open source LLMs for extracting MOUD prescription data from diverse EHR systems, improving coverage and accuracy while supporting privacy-preserving deployment.
Findings
Qwen2.5-32B achieves 93.4% coverage and 93.0% accuracy
MedGemma-27B achieves 93.1% coverage and 92.2% accuracy
Error analysis led to targeted fixes for missing data and unit misinterpretations.
Abstract
Harmonizing medication data across Electronic Health Record (EHR) systems is a persistent barrier to monitoring medications for opioid use disorder (MOUD). In heterogeneous EHR systems, key prescription attributes are scattered across differently formatted fields and freetext notes. We present a practical framework that customizes open source large language models (LLMs), including Llama, Qwen, Gemma, and MedGemma, to extract a unified set of MOUD prescription attributes (prescription date, drug name, duration, total quantity, daily quantity, and refills) from heterogeneous, site specific data and compute a standardized metric of medication coverage, \emph{MOUD days}, per patient. Our pipeline processes records directly in a fixed JSON schema, followed by lightweight normalization and cross-field consistency checks. We evaluate the system on prescription level EHR data from five clinics…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
