FGAIF: Aligning Large Vision-Language Models with Fine-grained AI   Feedback

Liqiang Jing; Xinya Du

arXiv:2404.05046·cs.CV·May 7, 2025·1 cites

FGAIF: Aligning Large Vision-Language Models with Fine-grained AI Feedback

Liqiang Jing, Xinya Du

PDF

Open Access

TL;DR

This paper introduces FGAIF, a novel approach that uses fine-grained AI feedback to improve alignment in large vision-language models, reducing hallucinations and enhancing performance with fewer parameters.

Contribution

The paper proposes a new fine-grained AI feedback method for aligning LVLMs, addressing limitations of existing RL-based approaches by providing detailed feedback and dense rewards.

Findings

01

Significantly reduces hallucination issues in LVLMs.

02

Improves performance on visual-language benchmarks.

03

Achieves better results with fewer model parameters.

Abstract

Large Vision-Language Models (LVLMs) have demonstrated proficiency in tackling a variety of visual-language tasks. However, current LVLMs suffer from misalignment between text and image modalities which causes three kinds of hallucination problems, i.e., object existence, object attribute, and object relationship. To tackle this issue, existing methods mainly utilize Reinforcement Learning (RL) to align modalities in LVLMs. However, they still suffer from three main limitations: (1) General feedback can not indicate the hallucination type contained in the response; (2) Sparse rewards only give the sequence-level reward for the whole response; and (3)Annotation cost is time-consuming and labor-intensive. To handle these limitations, we propose an innovative method to align modalities in LVLMs through Fine-Grained Artificial Intelligence Feedback (FGAIF), which mainly consists of three…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques

MethodsALIGN