RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness

Tianyu Yu; Haoye Zhang; Qiming Li; Qixin Xu; Yuan Yao; Da Chen; Xiaoman Lu; Ganqu Cui; Yunkai Dang; Taiwen He; Xiaocheng Feng; Jun Song; Bo Zheng; Zhiyuan Liu; Tat-Seng Chua; Maosong Sun

arXiv:2405.17220·cs.CL·October 30, 2025·3 cites

RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness

Tianyu Yu, Haoye Zhang, Qiming Li, Qixin Xu, Yuan Yao, Da Chen, Xiaoman Lu, Ganqu Cui, Yunkai Dang, Taiwen He, Xiaocheng Feng, Jun Song, Bo Zheng, Zhiyuan Liu, Tat-Seng Chua, Maosong Sun

PDF

Open Access 5 Repos 10 Models 5 Datasets

TL;DR

RLAIF-V introduces an open-source framework for aligning MLLMs, significantly reducing hallucinations and enhancing trustworthiness through high-quality feedback data and self-feedback mechanisms, achieving results comparable to proprietary models.

Contribution

This work presents RLAIF-V, a novel open-source framework for aligning MLLMs that improves trustworthiness and reduces hallucinations using feedback data and self-feedback guidance.

Findings

01

RLAIF-V 7B reduces object hallucination by 80.7%.

02

RLAIF-V 12B achieves super GPT-4V trustworthiness.

03

Extensive benchmarks show significant trustworthiness improvements.

Abstract

Traditional feedback learning for hallucination reduction relies on labor-intensive manual labeling or expensive proprietary models. This leaves the community without foundational knowledge about how to build high-quality feedback with open-source MLLMs. In this work, we introduce RLAIF-V, a novel framework that aligns MLLMs in a fully open-source paradigm. RLAIF-V maximally explores open-source MLLMs from two perspectives, including high-quality feedback data generation for preference learning and self-feedback guidance for inference-time scaling. Extensive experiments on six benchmarks in both automatic and human evaluation show that RLAIF-V substantially enhances the trustworthiness of models at both preference learning and inference time. RLAIF-V 7B reduces object hallucination by 80.7\% and overall hallucination by 33.7\%. Remarkably, RLAIF-V 12B further reveals the self-alignment…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Machine Learning in Healthcare · Artificial Intelligence in Healthcare and Education