Mitigating Hallucination Through Theory-Consistent Symmetric Multimodal Preference Optimization
Wenqi Liu, Xuemeng Song, Jiaxi Li, Yinwei Wei, Na Zheng, Jianhua Yin, Liqiang Nie

TL;DR
This paper introduces SymMPO, a novel symmetric preference optimization method that directly supervises multimodal models to reduce hallucinations, outperforming existing vision-oriented contrastive approaches.
Contribution
It proposes SymMPO, a theoretically aligned, direct preference supervision technique with a preference margin consistency loss for improved hallucination mitigation in MLLMs.
Findings
SymMPO outperforms existing methods on five benchmarks.
SymMPO effectively reduces hallucinations in multimodal models.
The approach maintains rigorous theoretical alignment with standard DPO.
Abstract
Direct Preference Optimization (DPO) has emerged as an effective approach for mitigating hallucination in Multimodal Large Language Models (MLLMs). Although existing methods have achieved significant progress by utilizing vision-oriented contrastive objectives for enhancing MLLMs' attention to visual inputs and hence reducing hallucination, they suffer from non-rigorous optimization objective function and indirect preference supervision. To address these limitations, we propose a Symmetric Multimodal Preference Optimization (SymMPO), which conducts symmetric preference learning with direct preference supervision (i.e., response pairs) for visual understanding enhancement, while maintaining rigorous theoretical alignment with standard DPO. In addition to conventional ordinal preference learning, SymMPO introduces a preference margin consistency loss to quantitatively regulate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
