DA-DPO: Cost-efficient Difficulty-aware Preference Optimization for Reducing MLLM Hallucinations
Longtian Qiu, Shan Ning, Chuyu Zhang, Jiaxuan Sun, Xuming He

TL;DR
DA-DPO introduces a cost-effective, difficulty-aware preference optimization framework that reduces hallucinations in multimodal large language models by balancing learning focus on challenging examples.
Contribution
It proposes a novel difficulty estimation and reweighting method for preference optimization, improving hallucination mitigation without additional data or fine-tuning.
Findings
Enhanced robustness to hallucinations across benchmarks
Improved generalization in multimodal preference optimization
Maintained computational efficiency
Abstract
Direct Preference Optimization (DPO) has shown strong potential for mitigating hallucinations in Multimodal Large Language Models (MLLMs). However, existing multimodal DPO approaches often suffer from overfitting due to the difficulty imbalance in preference data. Our analysis shows that MLLMs tend to overemphasize easily distinguishable preference pairs, which hinders fine-grained hallucination suppression and degrades overall performance. To address this issue, we propose Difficulty-Aware Direct Preference Optimization (DA-DPO), a cost-effective framework designed to balance the learning process. DA-DPO consists of two main components: (1) Difficulty Estimation leverages pre-trained vision--language models with complementary generative and contrastive objectives, whose outputs are integrated via a distribution-aware voting strategy to produce robust difficulty scores without…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Multimodal Machine Learning Applications · Constraint Satisfaction and Optimization
