Exploring Robustness in Doctor-Patient Conversation Summarization: An Analysis of Out-of-Domain SOAP Notes
Yu-Wen Chen, Julia Hirschberg

TL;DR
This paper evaluates the robustness of summarization models for doctor-patient conversations when applied to out-of-domain SOAP notes, revealing insights into format mismatch and model hallucinations.
Contribution
It compares general and SOAP-oriented summarization models, analyzing their performance and limitations on out-of-domain medical conversation data.
Findings
Format mismatch is not the main cause of performance decline.
Models exhibit hallucinations and missing information in summaries.
SOAP note structure influences model output quality.
Abstract
Summarizing medical conversations poses unique challenges due to the specialized domain and the difficulty of collecting in-domain training data. In this study, we investigate the performance of state-of-the-art doctor-patient conversation generative summarization models on the out-of-domain data. We divide the summarization model of doctor-patient conversation into two configurations: (1) a general model, without specifying subjective (S), objective (O), and assessment (A) and plan (P) notes; (2) a SOAP-oriented model that generates a summary with SOAP sections. We analyzed the limitations and strengths of the fine-tuning language model-based methods and GPTs on both configurations. We also conducted a Linguistic Inquiry and Word Count analysis to compare the SOAP notes from different datasets. The results exhibit a strong correlation for reference notes across different datasets,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Text Analysis Techniques · Topic Modeling · Biomedical Text Mining and Ontologies
