M3PO: Multimodal-Model-Guided Preference Optimization for Visual Instruction Following
Ruirui Gao, Emily Johnson, Bowen Tan, Yanfei Qian

TL;DR
M3PO introduces a data-efficient preference optimization method for LVLMs, leveraging model-guided selection of informative training pairs to improve visual instruction following performance.
Contribution
The paper presents M3PO, a novel approach that combines external quality assessment and internal confidence to select high-value preference samples for fine-tuning LVLMs.
Findings
M3PO outperforms SFT, RLHF, DPO, and RM-DPO baselines.
It achieves superior results across multiple multimodal instruction benchmarks.
The method enhances the efficiency and effectiveness of preference-based fine-tuning.
Abstract
Large Vision-Language Models (LVLMs) hold immense potential for complex multimodal instruction following, yet their development is often hindered by the high cost and inconsistency of human annotation required for effective fine-tuning and preference alignment. Traditional supervised fine-tuning (SFT) and existing preference optimization methods like RLHF and DPO frequently struggle to efficiently leverage the model's own generation space to identify highly informative "hard negative" samples. To address these challenges, we propose Multimodal-Model-Guided Preference Optimization (M3PO), a novel and data-efficient method designed to enhance LVLMs' capabilities in visual instruction following. M3PO intelligently selects the most "learning-valuable" preference sample pairs from a diverse pool of LVLM-generated candidates. This selection is driven by a sophisticated mechanism that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Constraint Satisfaction and Optimization · Scheduling and Timetabling Solutions
