PrivMedChat: End-to-End Differentially Private RLHF for Medical Dialogue Systems

Sudip Bhujel

arXiv:2603.03054·cs.CL·March 10, 2026

PrivMedChat: End-to-End Differentially Private RLHF for Medical Dialogue Systems

Sudip Bhujel

PDF

Open Access

TL;DR

PrivMedChat introduces a novel end-to-end differentially private reinforcement learning framework for medical dialogue systems, ensuring privacy while maintaining utility and safety in sensitive clinical conversations.

Contribution

It develops a comprehensive DP-RLHF approach for medical chatbots, including an annotation-free preference construction method and privacy-preserving training stages.

Findings

01

Effective privacy guarantees demonstrated across tasks

02

Maintains utility and safety in medical dialogues

03

Open-source implementation available

Abstract

Large language models are increasingly used for patient-facing medical assistance and clinical decision support, but adapting them to clinical dialogue often requires supervision derived from doctor-patient conversations that may contain sensitive information. Conventional supervised fine-tuning and reinforcement learning from human feedback (RLHF) can amplify memorization, enabling membership inference and disclosure of rare training-set details. We present PrivMedChat (Private Medical Chat), an end-to-end framework for differentially private RLHF (DP-RLHF) for medical dialogue systems. Our approach enforces differential privacy at each training stage that accesses dialogue-derived supervision, combining DP-SGD for supervised fine-tuning and reward model learning from preference pairs, and DP-aware policy optimization for alignment. To avoid costly clinician labeling, we introduce an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare · Topic Modeling · Multimodal Machine Learning Applications