HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
Junying Chen, Zhenyang Cai, Ke Ji, Xidong Wang, Wanlong Liu, Rongsheng, Wang, Jianye Hou, Benyou Wang

TL;DR
HuatuoGPT-o1 is a specialized medical language model that leverages verifiable problems and reinforcement learning to enhance complex reasoning in medical diagnosis and problem-solving.
Contribution
The paper introduces a novel two-stage training approach using verifiable medical problems and RL to improve reasoning in a medical LLM, HuatuoGPT-o1.
Findings
HuatuoGPT-o1 outperforms baselines in medical reasoning tasks.
Verifiable problems effectively guide model training.
Reinforcement learning further enhances reasoning capabilities.
Abstract
The breakthrough of OpenAI o1 highlights the potential of enhancing reasoning to improve LLM. Yet, most research in reasoning has focused on mathematical tasks, leaving domains like medicine underexplored. The medical domain, though distinct from mathematics, also demands robust reasoning to provide reliable answers, given the high standards of healthcare. However, verifying medical reasoning is challenging, unlike those in mathematics. To address this, we propose verifiable medical problems with a medical verifier to check the correctness of model outputs. This verifiable nature enables advancements in medical reasoning through a two-stage approach: (1) using the verifier to guide the search for a complex reasoning trajectory for fine-tuning LLMs, (2) applying reinforcement learning (RL) with verifier-based rewards to enhance complex reasoning further. Finally, we introduce…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗FreedomIntelligence/HuatuoGPT-o1-7Bmodel· 302 dl· ♡ 56302 dl♡ 56
- 🤗FreedomIntelligence/HuatuoGPT-o1-8Bmodel· 565 dl· ♡ 57565 dl♡ 57
- 🤗FreedomIntelligence/HuatuoGPT-o1-70Bmodel· 87 dl· ♡ 1587 dl♡ 15
- 🤗FreedomIntelligence/HuatuoGPT-o1-72Bmodel· 77 dl· ♡ 3477 dl♡ 34
- 🤗FreedomIntelligence/medical_o1_verifier_3Bmodel· 288 dl· ♡ 19288 dl♡ 19
- 🤗QuantFactory/HuatuoGPT-o1-8B-GGUFmodel· 279 dl· ♡ 3279 dl♡ 3
- 🤗QuantFactory/HuatuoGPT-o1-7B-GGUFmodel· 172 dl· ♡ 6172 dl♡ 6
- 🤗Yujivus/Phi-4-Health-CoT-1.1-AWQmodel· 2 dl2 dl
- 🤗cgus/HuatuoGPT-o1-7B-exl2model
- 🤗FreedomIntelligence/medical_o1_verifier_3B_Qwen2.5model· 86 dl· ♡ 686 dl♡ 6
- FreedomIntelligence/medical-o1-reasoning-SFTdataset· 6.0k dl6.0k dl
- FreedomIntelligence/medical-o1-verifiable-problemdataset· 1.6k dl1.6k dl
- ChuGyouk/medical-o1-reasoning-SFT-Kodataset· 38 dl38 dl
- PocketDoc/Dans-Logicmaxx-FI-VeriMeddataset· 26 dl26 dl
- FreedomIntelligence/Medical-R1-Distill-Datadataset· 132 dl132 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Intelligent Tutoring Systems and Adaptive Learning · Topic Modeling
