CRAFT: Clinical Reward-Aligned Finetuning for Medical Image Synthesis

Yunsung Chung; Alex El Darzi; Carlo El Khoury; Han Feng; Nassir Marrouche; Jihun Hamm

arXiv:2605.12650·cs.CV·May 14, 2026

CRAFT: Clinical Reward-Aligned Finetuning for Medical Image Synthesis

Yunsung Chung, Alex El Darzi, Carlo El Khoury, Han Feng, Nassir Marrouche, Jihun Hamm

PDF

TL;DR

This paper introduces CRAFT, a reward-based finetuning framework for medical image synthesis that improves clinical alignment and reduces hallucinations in generated images across multiple modalities.

Contribution

CRAFT leverages clinical reward signals and multimodal models to enhance medical image synthesis, addressing limitations of traditional metrics and improving clinical relevance.

Findings

01

CRAFT improves the Clinical Alignment Score (CAS) across four modalities.

02

CRAFT reduces low-alignment tail by 5.5-34.7% points, averaging 20.4%.

03

CRAFT enhances downstream classification performance and reduces hallucinations.

Abstract

Foundation diffusion models can generate photorealistic natural images, but adapting them to medical imaging remains challenging. In medical adaptation, limited labeled data can exacerbate hallucination-like and clinically implausible synthesis, while existing metrics such as FID or Inception Score do not quantify per-image alignment with pathology-relevant criteria. We introduce the Clinical Alignment Score (CAS), a foundation-model-based proxy for clinical alignment that evaluates generated images along four complementary dimensions beyond visual fidelity. Building on CAS, we propose Clinical Reward-Aligned Finetuning (CRAFT), a reward-based adaptation framework that transfers medical knowledge from multimodal large language models and vision-language models through label-conditioned prompt enrichment, clinical checklists, and differentiable reward optimization. Across four diverse…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.