PVminer: A Domain-Specific Tool to Detect the Patient Voice in Patient Generated Data

Samah Fodeh; Linhai Ma; Yan Wang; Srivani Talakokkul; Ganesh Puthiaraju; Afshan Khan; Ashley Hagaman; Sarah Lowe; Aimee Roundtree

arXiv:2602.21165·cs.CL·February 25, 2026

PVminer: A Domain-Specific Tool to Detect the Patient Voice in Patient Generated Data

Samah Fodeh, Linhai Ma, Yan Wang, Srivani Talakokkul, Ganesh Puthiaraju, Afshan Khan, Ashley Hagaman, Sarah Lowe, Aimee Roundtree

PDF

Open Access 3 Models

TL;DR

PVminer is a domain-specific NLP framework that effectively detects and structures the patient voice in patient-generated data, leveraging BERT models and topic modeling to improve performance over existing methods.

Contribution

The paper introduces PVminer, a novel NLP framework that combines patient-specific BERT encoders and topic modeling for multi-label detection of patient voice in healthcare communication.

Findings

01

PVminer outperforms baseline models with F1 scores over 80%.

02

Incorporating author identity and topic augmentation improves detection accuracy.

03

Pre-trained models and datasets will be publicly released.

Abstract

Patient-generated text such as secure messages, surveys, and interviews contains rich expressions of the patient voice (PV), reflecting communicative behaviors and social determinants of health (SDoH). Traditional qualitative coding frameworks are labor intensive and do not scale to large volumes of patient-authored messages across health systems. Existing machine learning (ML) and natural language processing (NLP) approaches provide partial solutions but often treat patient-centered communication (PCC) and SDoH as separate tasks or rely on models not well suited to patient-facing language. We introduce PVminer, a domain-adapted NLP framework for structuring patient voice in secure patient-provider communication. PVminer formulates PV detection as a multi-label, multi-class prediction task integrating patient-specific BERT encoders (PV-BERT-base and PV-BERT-large), unsupervised topic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare · Health Literacy and Information Accessibility · Topic Modeling