Mitigating Fine-Grained Hallucination by Fine-Tuning Large   Vision-Language Models with Caption Rewrites

Lei Wang; Jiabang He; Shenshen Li; Ning Liu; Ee-Peng Lim

arXiv:2312.01701·cs.CV·December 5, 2023·1 cites

Mitigating Fine-Grained Hallucination by Fine-Tuning Large Vision-Language Models with Caption Rewrites

Lei Wang, Jiabang He, Shenshen Li, Ning Liu, Ee-Peng Lim

PDF

Open Access 1 Repo

TL;DR

This paper introduces ReCaption, a framework that reduces fine-grained hallucinations in large vision-language models by rewriting captions with ChatGPT and fine-tuning the models, leading to improved accuracy and generation quality.

Contribution

It proposes a novel approach combining caption rewriting and fine-tuning to specifically target and mitigate fine-grained hallucinations in LVLMs, with a new evaluation method for detailed hallucination assessment.

Findings

01

ReCaption significantly reduces fine-grained hallucinations across different LVLMs.

02

The approach improves the quality of generated text in vision-language tasks.

03

The proposed evaluation method effectively measures fine-grained hallucination levels.

Abstract

Large language models (LLMs) have shown remarkable performance in natural language processing (NLP) tasks. To comprehend and execute diverse human instructions over image data, instruction-tuned large vision-language models (LVLMs) have been introduced. However, LVLMs may suffer from different types of object hallucinations. Nevertheless, LVLMs are evaluated for coarse-grained object hallucinations only (i.e., generated objects non-existent in the input image). The fine-grained object attributes and behaviors non-existent in the image may still be generated but not measured by the current evaluation methods. In this paper, we thus focus on reducing fine-grained hallucinations of LVLMs. We propose \textit{ReCaption}, a framework that consists of two components: rewriting captions using ChatGPT and fine-tuning the instruction-tuned LVLMs on the rewritten captions. We also propose a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

anonymousanoy/fohe
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Adversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)

MethodsFocus