FineMedLM-o1: Enhancing Medical Knowledge Reasoning Ability of LLM from Supervised Fine-Tuning to Test-Time Training

Hongzhou Yu; Tianhao Cheng; Yingwen Wang; Wen He; Qing Wang; Ying Cheng; Yuejie Zhang; Rui Feng; Xiaobo Zhang

arXiv:2501.09213·cs.CL·July 31, 2025·2 cites

FineMedLM-o1: Enhancing Medical Knowledge Reasoning Ability of LLM from Supervised Fine-Tuning to Test-Time Training

Hongzhou Yu, Tianhao Cheng, Yingwen Wang, Wen He, Qing Wang, Ying Cheng, Yuejie Zhang, Rui Feng, Xiaobo Zhang

PDF

Open Access 1 Repo 2 Models 2 Datasets

TL;DR

FineMedLM-o1 significantly improves medical reasoning in large language models through supervised fine-tuning, test-time training, and high-quality synthetic data, achieving notable performance gains on medical benchmarks.

Contribution

The paper introduces FineMedLM-o1, combining supervised fine-tuning, preference optimization, and novel test-time training for enhanced medical reasoning in LLMs.

Findings

01

23% performance improvement over prior models

02

Test-Time Training adds an additional 14% boost

03

Proposed high-quality synthetic medical dialogue dataset

Abstract

Recent advancements in large language models (LLMs) have shown promise in medical applications such as disease diagnosis and treatment planning. However, most existing medical LLMs struggle with the deep reasoning required for complex medical problems, such as differential diagnosis and medication recommendations. We propose FineMedLM-o1, which leverages high-quality medical synthetic data and long-form reasoning data for Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO), enabling advanced dialogue and deep reasoning capabilities. Additionally, we introduce Test-Time Training (TTT) in the medical domain for the first time, facilitating domain adaptation and ensuring reliable, accurate reasoning. Experimental results demonstrate that FineMedLM-o1 achieves a 23% average performance improvement over prior models on key medical benchmarks. Furthermore, the introduction…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hongzhouyu/finemed
pytorchOfficial

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBiomedical Text Mining and Ontologies · Intelligent Tutoring Systems and Adaptive Learning · Clinical Reasoning and Diagnostic Skills