Med-REFL: Medical Reasoning Enhancement via Self-Corrected Fine-grained Reflection
Zongxian Yang, Jiayu Qian, Zegao Peng, Haoyu Zhang, Yu-An Huang, KC Tan, Zhi-An Huang

TL;DR
Med-REFL introduces a self-correcting framework for medical reasoning models that automatically generates reflection data, significantly improving accuracy and reliability in high-stakes medical AI applications.
Contribution
It presents a novel, label-free reflection learning method that enhances reasoning accuracy by automatically assessing and correcting model fallacies in medical domains.
Findings
Boosts performance of Llama3.1-8B by +5.82% on MedQA
Achieves state-of-the-art results with Med-REFL-8B among 7-8B models
Generalizes to logical reasoning and reduces fake reflection phenomena
Abstract
Large reasoning models excel in domains like mathematics where intermediate reasoning is straightforward to verify, but struggle to self-correct in medicine fields where evaluating intermediate reasoning is cumbersome and expensive. This verification bottleneck hinders the development of reliable AI reasoners for high-stakes application. Here we propose Med-REFL, a novel framework that learns fine-grained reflection without human labels or model distillation. Med-REFL introduces a deterministic structural assessment of the reasoning space to automatically generate preference data for reflection. By globally evaluating all explored reasoning paths in a tree-of-thoughts, our method quantifies the value of corrective actions, enabling the automated construction of direct preference optimization pairs. This trains the model to recognize and amend its own reasoning fallacies. Extensive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies
MethodsFocus
