PVminerLLM: Structured Extraction of Patient Voice from Patient-Generated Text using Large Language Models
Samah Fodeh, Linhai Ma, Ganesh Puthiaraju, Srivani Talakokkul, Afshan Khan, Ashley Hagaman, Sarah Lowe, Aimee Roundtree

TL;DR
This paper presents PVminerLLM, a fine-tuned large language model designed to extract structured patient voice information from patient-generated text, significantly improving the accuracy of identifying social and experiential signals for healthcare research.
Contribution
Introduction of PVminerLLM, a supervised fine-tuned LLM that outperforms prompt-based methods in extracting structured patient voice signals from text.
Findings
Achieves up to 87.03% F1 in evidence span extraction.
Outperforms prompt-based baselines across multiple datasets.
Effective even with smaller models, enabling scalable analysis.
Abstract
Motivation: Patient-generated text contains critical information about patients' lived experiences, social circumstances, and engagement in care, including factors that strongly influence adherence, care coordination, and health equity. However, these patient voice signals are rarely available in structured form, limiting their use in patient-centered outcomes research and clinical quality improvement. Reliable extraction of such information is therefore essential for understanding and addressing non-clinical drivers of health outcomes at scale. Results: We introduce PVminer, a benchmark for structured extraction of patient voice, and propose PVminerLLM, a supervised fine-tuned large language model tailored to this task. Across multiple datasets and model sizes, PVminerLLM substantially outperforms prompt-based baselines, achieving up to 83.82% F1 for Code prediction, 80.74% F1 for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Topic Modeling · Voice and Speech Disorders
