Video-Text Dataset Construction from Multi-AI Feedback: Promoting   Weak-to-Strong Preference Learning for Video Large Language Models

Hao Yi; Qingyang Li; Yulan Hu; Fuzheng Zhang; Di Zhang and; Yong Liu

arXiv:2411.16201·cs.LG·November 26, 2024

Video-Text Dataset Construction from Multi-AI Feedback: Promoting Weak-to-Strong Preference Learning for Video Large Language Models

Hao Yi, Qingyang Li, Yulan Hu, Fuzheng Zhang, Di Zhang and, Yong Liu

PDF

Open Access

TL;DR

This paper introduces a high-quality AI-generated video-text preference dataset and a reinforcement learning framework to improve multimodal large language models' alignment, addressing data scarcity and response diversity issues.

Contribution

It presents MMAIP-V, a novel AI-generated preference dataset, and Iter-W2S-RLAIF, a reinforcement learning method for enhancing video MLLMs, advancing preference learning and model alignment.

Findings

01

MMAIP-V improves preference learning for MLLMs.

02

Iter-W2S-RLAIF effectively exploits preference data.

03

Enhanced model alignment demonstrated in experiments.

Abstract

High-quality video-text preference data is crucial for Multimodal Large Language Models (MLLMs) alignment. However, existing preference data is very scarce. Obtaining VQA preference data for preference training is costly, and manually annotating responses is highly unreliable, which could result in low-quality pairs. Meanwhile, AI-generated responses controlled by temperature adjustment lack diversity. To address these issues, we propose a high-quality VQA preference dataset, called \textit{\textbf{M}ultiple \textbf{M}ultimodal \textbf{A}rtificial \textbf{I}ntelligence \textbf{P}reference Datasets in \textbf{V}QA} (\textbf{MMAIP-V}), which is constructed by sampling from the response distribution set and using an external scoring function for response evaluation. Furthermore, to fully leverage the preference knowledge in MMAIP-V and ensure sufficient optimization, we propose…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Machine Learning and Data Classification · Explainable Artificial Intelligence (XAI)

MethodsSparse Evolutionary Training