M3PO: Multimodal-Model-Guided Preference Optimization for Visual Instruction Following

Ruirui Gao; Emily Johnson; Bowen Tan; Yanfei Qian

arXiv:2508.12458·cs.CL·August 19, 2025

M3PO: Multimodal-Model-Guided Preference Optimization for Visual Instruction Following

Ruirui Gao, Emily Johnson, Bowen Tan, Yanfei Qian

PDF

Open Access

TL;DR

M3PO introduces a data-efficient preference optimization method for LVLMs, leveraging model-guided selection of informative training pairs to improve visual instruction following performance.

Contribution

The paper presents M3PO, a novel approach that combines external quality assessment and internal confidence to select high-value preference samples for fine-tuning LVLMs.

Findings

01

M3PO outperforms SFT, RLHF, DPO, and RM-DPO baselines.

02

It achieves superior results across multiple multimodal instruction benchmarks.

03

The method enhances the efficiency and effectiveness of preference-based fine-tuning.

Abstract

Large Vision-Language Models (LVLMs) hold immense potential for complex multimodal instruction following, yet their development is often hindered by the high cost and inconsistency of human annotation required for effective fine-tuning and preference alignment. Traditional supervised fine-tuning (SFT) and existing preference optimization methods like RLHF and DPO frequently struggle to efficiently leverage the model's own generation space to identify highly informative "hard negative" samples. To address these challenges, we propose Multimodal-Model-Guided Preference Optimization (M3PO), a novel and data-efficient method designed to enhance LVLMs' capabilities in visual instruction following. M3PO intelligently selects the most "learning-valuable" preference sample pairs from a diverse pool of LVLM-generated candidates. This selection is driven by a sophisticated mechanism that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Constraint Satisfaction and Optimization · Scheduling and Timetabling Solutions