RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from   Fine-grained Correctional Human Feedback

Tianyu Yu; Yuan Yao; Haoye Zhang; Taiwen He; Yifeng Han; and Ganqu Cui; Jinyi Hu; Zhiyuan Liu; Hai-Tao Zheng; Maosong Sun; and Tat-Seng Chua

arXiv:2312.00849·cs.CL·March 11, 2024·5 cites

RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback

Tianyu Yu, Yuan Yao, Haoye Zhang, Taiwen He, Yifeng Han, and Ganqu Cui, Jinyi Hu, Zhiyuan Liu, Hai-Tao Zheng, Maosong Sun, and Tat-Seng Chua

PDF

Open Access 4 Repos 10 Models 3 Datasets

TL;DR

RLHF-V improves multimodal large language models by using fine-grained human feedback to significantly reduce hallucinations, enhancing trustworthiness with high data and computational efficiency.

Contribution

The paper introduces RLHF-V, a novel method that leverages segment-level human feedback for behavior alignment, significantly reducing hallucinations in MLLMs.

Findings

01

Reduces hallucination rate by 34.8% with only 1.4k data samples.

02

Outperforms models trained on larger datasets in trustworthiness.

03

Achieves state-of-the-art trustworthiness among open-source MLLMs.

Abstract

Multimodal Large Language Models (MLLMs) have recently demonstrated impressive capabilities in multimodal understanding, reasoning, and interaction. However, existing MLLMs prevalently suffer from serious hallucination problems, generating text that is not factually grounded in associated images. The problem makes existing MLLMs untrustworthy and thus impractical in real-world (especially high-stakes) applications. To address the challenge, we present RLHF-V, which enhances MLLM trustworthiness via behavior alignment from fine-grained correctional human feedback. Specifically, RLHF-V collects human preference in the form of segment-level corrections on hallucinations, and performs dense direct preference optimization over the human feedback. Comprehensive experiments on five benchmarks in both automatic and human evaluation show that, RLHF-V can enable substantially more trustworthy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification

MethodsBalanced Selection