Enhancing the Reasoning Ability of Multimodal Large Language Models via   Mixed Preference Optimization

Weiyun Wang; Zhe Chen; Wenhai Wang; Yue Cao; Yangzhou Liu; Zhangwei; Gao; Jinguo Zhu; Xizhou Zhu; Lewei Lu; Yu Qiao; Jifeng Dai

arXiv:2411.10442·cs.CL·April 8, 2025·3 cites

Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization

Weiyun Wang, Zhe Chen, Wenhai Wang, Yue Cao, Yangzhou Liu, Zhangwei, Gao, Jinguo Zhu, Xizhou Zhu, Lewei Lu, Yu Qiao, Jifeng Dai

PDF

Open Access 1 Repo 10 Models 5 Datasets

TL;DR

This paper introduces a preference optimization method to improve multimodal reasoning in large language models, significantly enhancing their Chain-of-Thought performance and achieving state-of-the-art results on reasoning benchmarks.

Contribution

The paper proposes Mixed Preference Optimization (MPO) and a large-scale multimodal reasoning dataset, improving reasoning capabilities of MLLMs beyond existing fine-tuning methods.

Findings

01

InternVL2-8B-MPO achieves 67.0 accuracy on MathVista.

02

MPO boosts multimodal Chain-of-Thought performance.

03

Model performance approaches that of much larger models.

Abstract

Existing open-source multimodal large language models (MLLMs) generally follow a training process involving pre-training and supervised fine-tuning. However, these models suffer from distribution shifts, which limit their multimodal reasoning, particularly in the Chain-of-Thought (CoT) performance. To address this, we introduce a preference optimization (PO) process to enhance the multimodal reasoning capabilities of MLLMs. Specifically, (1) on the data side, we design an automated preference data construction pipeline to create MMPR, a high-quality, large-scale multimodal reasoning preference dataset; and (2) on the model side, we explore integrating PO with MLLMs, developing a simple yet effective method, termed Mixed Preference Optimization (MPO), which boosts multimodal CoT performance. Our approach enhances the multimodal reasoning abilities of both InternVL2-8B and InternVL2-76B.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

opengvlab/internvl
pytorch

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech and dialogue systems · Topic Modeling

MethodsParrot optimizer: Algorithm and applications to medical problems