Dialogue is Better Than Monologue: Instructing Medical LLMs via Strategical Conversations
Zijie Liu, Xinyu Zhao, Jie Peng, Zhuangdi Zhu, Qingyu Chen, Kaidi Xu, Xia Hu, Tianlong Chen

TL;DR
This paper introduces a new benchmark and dialogue-based fine-tuning approach for medical language models, improving their reasoning and robustness in realistic diagnostic scenarios with noise and complexity.
Contribution
It presents a novel benchmark simulating real-world clinical reasoning and demonstrates that dialogue fine-tuning enhances model performance over traditional static training methods.
Findings
9.64% improvement in multi-round reasoning
6.18% accuracy boost in noisy environments
Dialogue tuning outperforms static datasets in clinical reasoning tasks
Abstract
Current medical AI systems often fail to replicate real-world clinical reasoning, as they are predominantly trained and evaluated on static text and question-answer tasks. These tuning methods and benchmarks overlook critical aspects like evidence-based reasoning and handling distracting information. To bridge this gap, we introduce a novel benchmark that simulates real-world diagnostic scenarios, integrating noise and difficulty levels aligned with USMLE standards. Moreover, we explore dialogue-based fine-tuning, which transforms static datasets into conversational formats to better capture iterative reasoning processes. Experiments show that dialogue-tuned models outperform traditional methods, with improvements of in multi-round reasoning scenarios and in accuracy in a noisy environment. Our findings highlight dialogue tuning as a promising approach for advancing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLegal Education and Practice Innovations
