Baichuan-M3: Modeling Clinical Inquiry for Reliable Medical Decision-Making

Baichuan-M3 Team: Chengfeng Dou; Fan Yang; Fei Li; Jiyuan Jia; Qiang Ju; Shuai Wang; Tianpeng Li; Xiangrong Zeng; Yijie Zhou; Hongda Zhang; Jinyang Tai; Linzhuang Sun; Peidong Guo; Yichuan Mo; Xiaochuan Wang; Hengfu Cui; Zhishou Zhang

arXiv:2602.06570·cs.CL·February 9, 2026

Baichuan-M3: Modeling Clinical Inquiry for Reliable Medical Decision-Making

Baichuan-M3 Team: Chengfeng Dou, Fan Yang, Fei Li, Jiyuan Jia, Qiang Ju, Shuai Wang, Tianpeng Li, Xiangrong Zeng, Yijie Zhou, Hongda Zhang, Jinyang Tai, Linzhuang Sun, Peidong Guo, Yichuan Mo, Xiaochuan Wang, Hengfu Cui, Zhishou Zhang

PDF

Open Access 4 Models

TL;DR

Baichuan-M3 is a specialized medical large language model designed for active clinical decision support, featuring proactive information gathering, long-term reasoning, and hallucination suppression, achieving state-of-the-art results in medical inquiry tasks.

Contribution

It introduces Baichuan-M3, a novel medical LLM that models physician workflows for improved clinical decision-making and outperforms existing models on new medical benchmarks.

Findings

01

Achieves state-of-the-art results on HealthBench, HealthBench-Hallu, and ScanBench.

02

Significantly outperforms GPT-5.2 in clinical inquiry and safety.

03

Models are publicly available for research and development.

Abstract

We introduce Baichuan-M3, a medical-enhanced large language model engineered to shift the paradigm from passive question-answering to active, clinical-grade decision support. Addressing the limitations of existing systems in open-ended consultations, Baichuan-M3 utilizes a specialized training pipeline to model the systematic workflow of a physician. Key capabilities include: (i) proactive information acquisition to resolve ambiguity; (ii) long-horizon reasoning that unifies scattered evidence into coherent diagnoses; and (iii) adaptive hallucination suppression to ensure factual reliability. Empirical evaluations demonstrate that Baichuan-M3 achieves state-of-the-art results on HealthBench, the newly introduced HealthBench-Hallu and ScanBench, significantly outperforming GPT-5.2 in clinical inquiry, advisory and safety. The models are publicly available at…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare · Topic Modeling · Artificial Intelligence in Healthcare and Education