Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward
Ruohong Zhang, Liangke Gui, Zhiqing Sun, Yihao Feng, Keyang Xu,, Yuanhan Zhang, Di Fu, Chunyuan Li, Alexander Hauptmann, Yonatan Bisk, and, Yiming Yang

TL;DR
This paper presents a novel framework that uses detailed video captions as proxies for video content, enabling large multimodal models to better assess factuality and improve performance on video question answering tasks.
Contribution
It introduces a new method leveraging video captions as evidence, aligning with GPT-4V's reward system, to enhance preference optimization in video multimodal models.
Findings
Improved alignment with GPT-4V reward mechanism.
Enhanced performance on video QA tasks.
Effective use of video captions as content proxies.
Abstract
Preference modeling techniques, such as direct preference optimization (DPO), has shown effective in enhancing the generalization abilities of large language model (LLM). However, in tasks involving video instruction-following, providing informative feedback, especially for detecting hallucinations in generated responses, remains a significant challenge. Previous studies have explored using large large multimodal models (LMMs) as reward models to guide preference modeling, but their ability to accurately assess the factuality of generated responses compared to corresponding videos has not been conclusively established. This paper introduces a novel framework that utilizes detailed video captions as a proxy of video content, enabling language models to incorporate this information as supporting evidence for scoring video Question Answering (QA) predictions. Our approach demonstrates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsEducational and Technological Research · Computational and Text Analysis Methods · Multimodal Machine Learning Applications
MethodsDirect Preference Optimization
