Benchmarking Direct Preference Optimization for Medical Large Vision-Language Models

Dain Kim; Jiwoo Lee; Jaehoon Yun; Yong Hoe Koo; Qingyu Chen; Hyunjae Kim; Jaewoo Kang

arXiv:2601.17918·cs.CV·January 27, 2026

Benchmarking Direct Preference Optimization for Medical Large Vision-Language Models

Dain Kim, Jiwoo Lee, Jaehoon Yun, Yong Hoe Koo, Qingyu Chen, Hyunjae Kim, Jaewoo Kang

PDF

Open Access 1 Video

TL;DR

This paper evaluates various Direct Preference Optimization (DPO) methods for medical vision-language models, identifying limitations and proposing a targeted strategy that improves visual question-answering accuracy by 3.6%.

Contribution

It provides the first comprehensive empirical analysis of DPO variants in medical LVLMs and introduces a new preference construction method to address visual misinterpretation errors.

Findings

01

DPO approaches show inconsistent improvements over supervised fine-tuning.

02

Current DPO methods often fail to fix visual misinterpretation errors.

03

A targeted preference strategy improves visual QA performance by 3.6%.

Abstract

Large Vision-Language Models (LVLMs) hold significant promise for medical applications, yet their deployment is often constrained by insufficient alignment and reliability. While Direct Preference Optimization (DPO) has emerged as a potent framework for refining model responses, its efficacy in high-stakes medical contexts remains underexplored, lacking the rigorous empirical groundwork necessary to guide future methodological advances. To bridge this gap, we present the first comprehensive examination of diverse DPO variants within the medical domain, evaluating nine distinct formulations across two medical LVLMs: LLaVA-Med and HuatuoGPT-Vision. Our results reveal several critical limitations: current DPO approaches often yield inconsistent gains over supervised fine-tuning, with their efficacy varying significantly across different tasks and backbones. Furthermore, they frequently…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Benchmarking Direct Preference Optimization for Medical Large Vision-Language Models· underline

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Explainable Artificial Intelligence (XAI)