Evaluating an evidence-guided reinforcement learning framework in aligning light-parameter large language models with decision-making cognition in psychiatric clinical reasoning

Xinxin Lin; Guangxin Dai; Yi Zhong; Xiang Li; Xue Xiao; Yixin Zhang; Zhengdong Wu; Yongbo Zheng; Runchuan Zhu; Ming Zhao; Huizi Yu; Shuo Wu; Jun Zhao; Lingming Hu; Yumei Wang; Ping Yin; Joey W.Y. Chan; Ngan Yin Chan; Sijing Chen; Yun Kwok Wing; Lin Lu; Xin Ma; and Lizhou Fan

arXiv:2602.06449·cs.CL·February 9, 2026

Evaluating an evidence-guided reinforcement learning framework in aligning light-parameter large language models with decision-making cognition in psychiatric clinical reasoning

Xinxin Lin, Guangxin Dai, Yi Zhong, Xiang Li, Xue Xiao, Yixin Zhang, Zhengdong Wu, Yongbo Zheng, Runchuan Zhu, Ming Zhao, Huizi Yu, Shuo Wu, Jun Zhao, Lingming Hu, Yumei Wang, Ping Yin, Joey W.Y. Chan, Ngan Yin Chan, Sijing Chen, Yun Kwok Wing, Lin Lu, Xin Ma, and Lizhou Fan

PDF

Open Access

TL;DR

This paper introduces ClinMPO, a reinforcement learning framework that aligns light-parameter large language models with psychiatric reasoning, improving diagnostic accuracy beyond human benchmarks in complex clinical cases.

Contribution

The study presents a novel evidence-guided reinforcement learning approach that enhances the reasoning capabilities of light-parameter LLMs for psychiatric decision support.

Findings

01

ClinMPO improves LLM diagnostic accuracy to 31.4%.

02

ClinMPO outperforms human medical students on complex reasoning tasks.

03

Light-parameter LLMs can master complex psychiatric reasoning with evidence-based alignment.

Abstract

Large language models (LLMs) hold transformative potential for medical decision support yet their application in psychiatry remains constrained by hallucinations and superficial reasoning. This limitation is particularly acute in light-parameter LLMs which are essential for privacy-preserving and efficient clinical deployment. Existing training paradigms prioritize linguistic fluency over structured clinical logic and result in a fundamental misalignment with professional diagnostic cognition. Here we introduce ClinMPO, a reinforcement learning framework designed to align the internal reasoning of LLMs with professional psychiatric practice. The framework employs a specialized reward model trained independently on a dataset derived from 4,474 psychiatry journal articles and structured according to evidence-based medicine principles. We evaluated ClinMPO on a unseen subset of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Machine Learning in Healthcare · Clinical Reasoning and Diagnostic Skills