CIEM: Contrastive Instruction Evaluation Method for Better Instruction Tuning
Hongyu Hu, Jiyuan Zhang, Minyi Zhao, Zhenbang Sun

TL;DR
This paper introduces CIEM, an automatic evaluation pipeline for hallucination in vision-language models, and CIT, a new instruction tuning method that reduces hallucinations by generating high-quality factual and contrastive data for model training.
Contribution
The paper presents a novel contrastive evaluation method (CIEM) and an instruction tuning approach (CIT) that together improve hallucination detection and reduction in large vision-language models.
Findings
CIT-tuned VLMs outperform baseline models in hallucination mitigation.
CIEM effectively identifies hallucination issues in existing VLMs.
CIT enhances model factual accuracy and robustness.
Abstract
Nowadays, the research on Large Vision-Language Models (LVLMs) has been significantly promoted thanks to the success of Large Language Models (LLM). Nevertheless, these Vision-Language Models (VLMs) are suffering from the drawback of hallucination -- due to insufficient understanding of vision and language modalities, VLMs may generate incorrect perception information when doing downstream applications, for example, captioning a non-existent entity. To address the hallucination phenomenon, on the one hand, we introduce a Contrastive Instruction Evaluation Method (CIEM), which is an automatic pipeline that leverages an annotated image-text dataset coupled with an LLM to generate factual/contrastive question-answer pairs for the evaluation of the hallucination of VLMs. On the other hand, based on CIEM, we further propose a new instruction tuning method called CIT (the abbreviation of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications
