Automated Multi-level Preference for MLLMs
Mengxi Zhang, Wenhao Wu, Yu Lu, Yuxin Song, Kang Rong, Huanjin Yao,, Jianbo Zhao, Fanglong Liu, Yifan Sun, Haocheng Feng, Jingdong Wang

TL;DR
This paper introduces the AMP framework for improving multimodal large language models by utilizing automated multi-level preferences and a new preference optimization algorithm to reduce hallucinations and enhance response quality.
Contribution
It proposes a novel multi-level preference learning approach with an automated dataset pipeline and a direct preference optimization algorithm for MLLMs.
Findings
Improved MLLM performance on hallucination benchmarks.
Effective reduction of hallucinations in multimodal responses.
Enhanced subtlety detection in model responses.
Abstract
Current multimodal Large Language Models (MLLMs) suffer from ``hallucination'', occasionally generating responses that are not grounded in the input images. To tackle this challenge, one promising path is to utilize reinforcement learning from human feedback (RLHF), which steers MLLMs towards learning superior responses while avoiding inferior ones. We rethink the common practice of using binary preferences (i.e., superior, inferior), and find that adopting multi-level preferences (e.g., superior, medium, inferior) is better for two benefits: 1) It narrows the gap between adjacent levels, thereby encouraging MLLMs to discern subtle differences. 2) It further integrates cross-level comparisons (beyond adjacent-level comparisons), thus providing a broader range of comparisons with hallucination examples. To verify our viewpoint, we present the Automated Multi-level Preference (AMP)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsSpeech and dialogue systems · Semantic Web and Ontologies · Natural Language Processing Techniques
