Video-Text Dataset Construction from Multi-AI Feedback: Promoting Weak-to-Strong Preference Learning for Video Large Language Models
Hao Yi, Qingyang Li, Yulan Hu, Fuzheng Zhang, Di Zhang and, Yong Liu

TL;DR
This paper introduces a high-quality AI-generated video-text preference dataset and a reinforcement learning framework to improve multimodal large language models' alignment, addressing data scarcity and response diversity issues.
Contribution
It presents MMAIP-V, a novel AI-generated preference dataset, and Iter-W2S-RLAIF, a reinforcement learning method for enhancing video MLLMs, advancing preference learning and model alignment.
Findings
MMAIP-V improves preference learning for MLLMs.
Iter-W2S-RLAIF effectively exploits preference data.
Enhanced model alignment demonstrated in experiments.
Abstract
High-quality video-text preference data is crucial for Multimodal Large Language Models (MLLMs) alignment. However, existing preference data is very scarce. Obtaining VQA preference data for preference training is costly, and manually annotating responses is highly unreliable, which could result in low-quality pairs. Meanwhile, AI-generated responses controlled by temperature adjustment lack diversity. To address these issues, we propose a high-quality VQA preference dataset, called \textit{\textbf{M}ultiple \textbf{M}ultimodal \textbf{A}rtificial \textbf{I}ntelligence \textbf{P}reference Datasets in \textbf{V}QA} (\textbf{MMAIP-V}), which is constructed by sampling from the response distribution set and using an external scoring function for response evaluation. Furthermore, to fully leverage the preference knowledge in MMAIP-V and ensure sufficient optimization, we propose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Machine Learning and Data Classification · Explainable Artificial Intelligence (XAI)
MethodsSparse Evolutionary Training
