CRAFT: Clinical Reward-Aligned Finetuning for Medical Image Synthesis
Yunsung Chung, Alex El Darzi, Carlo El Khoury, Han Feng, Nassir Marrouche, Jihun Hamm

TL;DR
This paper introduces CRAFT, a reward-based finetuning framework for medical image synthesis that improves clinical alignment and reduces hallucinations in generated images across multiple modalities.
Contribution
CRAFT leverages clinical reward signals and multimodal models to enhance medical image synthesis, addressing limitations of traditional metrics and improving clinical relevance.
Findings
CRAFT improves the Clinical Alignment Score (CAS) across four modalities.
CRAFT reduces low-alignment tail by 5.5-34.7% points, averaging 20.4%.
CRAFT enhances downstream classification performance and reduces hallucinations.
Abstract
Foundation diffusion models can generate photorealistic natural images, but adapting them to medical imaging remains challenging. In medical adaptation, limited labeled data can exacerbate hallucination-like and clinically implausible synthesis, while existing metrics such as FID or Inception Score do not quantify per-image alignment with pathology-relevant criteria. We introduce the Clinical Alignment Score (CAS), a foundation-model-based proxy for clinical alignment that evaluates generated images along four complementary dimensions beyond visual fidelity. Building on CAS, we propose Clinical Reward-Aligned Finetuning (CRAFT), a reward-based adaptation framework that transfers medical knowledge from multimodal large language models and vision-language models through label-conditioned prompt enrichment, clinical checklists, and differentiable reward optimization. Across four diverse…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
