HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs

Junying Chen; Zhenyang Cai; Ke Ji; Xidong Wang; Wanlong Liu; Rongsheng; Wang; Jianye Hou; Benyou Wang

arXiv:2412.18925·cs.CL·December 30, 2024·3 cites

HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs

Junying Chen, Zhenyang Cai, Ke Ji, Xidong Wang, Wanlong Liu, Rongsheng, Wang, Jianye Hou, Benyou Wang

PDF

Open Access 1 Repo 10 Models 5 Datasets

TL;DR

HuatuoGPT-o1 is a specialized medical language model that leverages verifiable problems and reinforcement learning to enhance complex reasoning in medical diagnosis and problem-solving.

Contribution

The paper introduces a novel two-stage training approach using verifiable medical problems and RL to improve reasoning in a medical LLM, HuatuoGPT-o1.

Findings

01

HuatuoGPT-o1 outperforms baselines in medical reasoning tasks.

02

Verifiable problems effectively guide model training.

03

Reinforcement learning further enhances reasoning capabilities.

Abstract

The breakthrough of OpenAI o1 highlights the potential of enhancing reasoning to improve LLM. Yet, most research in reasoning has focused on mathematical tasks, leaving domains like medicine underexplored. The medical domain, though distinct from mathematics, also demands robust reasoning to provide reliable answers, given the high standards of healthcare. However, verifying medical reasoning is challenging, unlike those in mathematics. To address this, we propose verifiable medical problems with a medical verifier to check the correctness of model outputs. This verifiable nature enables advancements in medical reasoning through a two-stage approach: (1) using the verifier to guide the search for a complex reasoning trajectory for fine-tuning LLMs, (2) applying reinforcement learning (RL) with verifier-based rewards to enhance complex reasoning further. Finally, we introduce…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

freedomintelligence/huatuogpt-o1
pytorchOfficial

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare · Intelligent Tutoring Systems and Adaptive Learning · Topic Modeling