Tuning Large Multimodal Models for Videos using Reinforcement Learning   from AI Feedback

Daechul Ahn; Yura Choi; Youngjae Yu; Dongyeop Kang; Jonghyun Choi

arXiv:2402.03746·cs.CV·June 18, 2024·2 cites

Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback

Daechul Ahn, Yura Choi, Youngjae Yu, Dongyeop Kang, Jonghyun Choi

PDF

Open Access 1 Repo

TL;DR

This paper introduces VLM-RLAIF, a novel reinforcement learning approach that uses AI-generated feedback to improve the alignment of video and text in large multimodal models, outperforming previous methods.

Contribution

The paper proposes RLAIF, a self-supervised reinforcement learning strategy with context-aware reward modeling for better video-text alignment in multimodal models.

Findings

01

VLM-RLAIF outperforms existing models on diverse benchmarks.

02

Self-preference feedback enhances multimodal alignment.

03

Open-sourcing promotes further research.

Abstract

Recent advancements in large language models have influenced the development of video large multimodal models (VLMMs). The previous approaches for VLMMs involved Supervised Fine-Tuning (SFT) with instruction-tuned datasets, integrating LLM with visual encoders, and adding additional learnable modules. Video and text multimodal alignment remains challenging, primarily due to the deficient volume and quality of multimodal instruction-tune data compared to text-only data. We present a novel alignment strategy that employs multimodal AI system to oversee itself called Reinforcement Learning from AI Feedback (RLAIF), providing self-preference feedback to refine itself and facilitating the alignment of video and text modalities. In specific, we propose context-aware reward modeling by providing detailed video descriptions as context during the generation of preference feedback in order to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yonseivnl/vlm-rlaif
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Reservoir Computing · Reinforcement Learning in Robotics

MethodsReinforcement Learning from AI Feedback · Shrink and Fine-Tune