PrivMedChat: End-to-End Differentially Private RLHF for Medical Dialogue Systems
Sudip Bhujel

TL;DR
PrivMedChat introduces a novel end-to-end differentially private reinforcement learning framework for medical dialogue systems, ensuring privacy while maintaining utility and safety in sensitive clinical conversations.
Contribution
It develops a comprehensive DP-RLHF approach for medical chatbots, including an annotation-free preference construction method and privacy-preserving training stages.
Findings
Effective privacy guarantees demonstrated across tasks
Maintains utility and safety in medical dialogues
Open-source implementation available
Abstract
Large language models are increasingly used for patient-facing medical assistance and clinical decision support, but adapting them to clinical dialogue often requires supervision derived from doctor-patient conversations that may contain sensitive information. Conventional supervised fine-tuning and reinforcement learning from human feedback (RLHF) can amplify memorization, enabling membership inference and disclosure of rare training-set details. We present PrivMedChat (Private Medical Chat), an end-to-end framework for differentially private RLHF (DP-RLHF) for medical dialogue systems. Our approach enforces differential privacy at each training stage that accesses dialogue-derived supervision, combining DP-SGD for supervised fine-tuning and reward model learning from preference pairs, and DP-aware policy optimization for alignment. To avoid costly clinician labeling, we introduce an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Topic Modeling · Multimodal Machine Learning Applications
