The Alignment Paradox of Medical Large Language Models in Infertility Care: Decoupling Algorithmic Improvement from Clinical Decision-making Quality
Dou Liu, Ying Long, Sophia Zuoqiu, Kaipeng Xie, Runze Yang, Di Liu, Kang Li, Yiting Lin, Hanyi Liu, Rong Yin, Tian Tang

TL;DR
This study evaluates different alignment strategies for medical large language models in infertility care, revealing that algorithmic accuracy does not always align with clinician trust or interpretability, highlighting an alignment paradox.
Contribution
It systematically compares four alignment methods, showing that reinforcement-based optimization improves accuracy but may reduce clinical trust and interpretability.
Findings
GRPO achieves highest algorithmic accuracy
Clinicians prefer SFT for interpretability and feasibility
Algorithmic improvements do not always increase clinical trust
Abstract
Large language models (LLMs) are increasingly adopted in clinical decision support, yet aligning them with the multifaceted reasoning pathways of real-world medicine remains a major challenge. Using more than 8,000 infertility treatment records, we systematically evaluate four alignment strategies: Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), Group Relative Policy Optimization (GRPO), and In-Context Learning (ICL) through a dual-layer framework combining automatic benchmarks with blinded doctor-in-the-loop assessments. GRPO achieves the highest algorithmic accuracy across multiple decision layers, confirming the value of reinforcement-based optimization for structured prediction tasks. However, clinicians consistently prefer the SFT model, citing clearer reasoning processes (p = 0.035) and higher therapeutic feasibility (p = 0.019). In blinded pairwise…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Machine Learning in Healthcare · Genomics and Rare Diseases
