CIEM: Contrastive Instruction Evaluation Method for Better Instruction   Tuning

Hongyu Hu; Jiyuan Zhang; Minyi Zhao; Zhenbang Sun

arXiv:2309.02301·cs.CV·November 27, 2023·2 cites

CIEM: Contrastive Instruction Evaluation Method for Better Instruction Tuning

Hongyu Hu, Jiyuan Zhang, Minyi Zhao, Zhenbang Sun

PDF

Open Access

TL;DR

This paper introduces CIEM, an automatic evaluation pipeline for hallucination in vision-language models, and CIT, a new instruction tuning method that reduces hallucinations by generating high-quality factual and contrastive data for model training.

Contribution

The paper presents a novel contrastive evaluation method (CIEM) and an instruction tuning approach (CIT) that together improve hallucination detection and reduction in large vision-language models.

Findings

01

CIT-tuned VLMs outperform baseline models in hallucination mitigation.

02

CIEM effectively identifies hallucination issues in existing VLMs.

03

CIT enhances model factual accuracy and robustness.

Abstract

Nowadays, the research on Large Vision-Language Models (LVLMs) has been significantly promoted thanks to the success of Large Language Models (LLM). Nevertheless, these Vision-Language Models (VLMs) are suffering from the drawback of hallucination -- due to insufficient understanding of vision and language modalities, VLMs may generate incorrect perception information when doing downstream applications, for example, captioning a non-existent entity. To address the hallucination phenomenon, on the one hand, we introduce a Contrastive Instruction Evaluation Method (CIEM), which is an automatic pipeline that leverages an annotated image-text dataset coupled with an LLM to generate factual/contrastive question-answer pairs for the evaluation of the hallucination of VLMs. On the other hand, based on CIEM, we further propose a new instruction tuning method called CIT (the abbreviation of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications