Hallucination Augmented Contrastive Learning for Multimodal Large   Language Model

Chaoya Jiang; Haiyang Xu; Mengfan Dong; Jiaxing Chen; Wei Ye; Ming; Yan; Qinghao Ye; Ji Zhang; Fei Huang; Shikun Zhang

arXiv:2312.06968·cs.CV·February 27, 2024·6 cites

Hallucination Augmented Contrastive Learning for Multimodal Large Language Model

Chaoya Jiang, Haiyang Xu, Mengfan Dong, Jiaxing Chen, Wei Ye, Ming, Yan, Qinghao Ye, Ji Zhang, Fei Huang, Shikun Zhang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a contrastive learning approach to reduce hallucinations in multimodal large language models by improving cross-modal representation alignment and distinguishing hallucination-containing texts.

Contribution

It proposes a novel contrastive learning method that uses hallucination-containing texts as hard negatives to enhance MLLMs' accuracy and reduce erroneous outputs.

Findings

01

Significant reduction in hallucination occurrences.

02

34.66% /29.5% improvement on MMhal-Bench.

03

Enhanced cross-modal representation alignment.

Abstract

Multi-modal large language models (MLLMs) have been shown to efficiently integrate natural language with visual information to handle multi-modal tasks. However, MLLMs still face a fundamental limitation of hallucinations, where they tend to generate erroneous or fabricated information. In this paper, we address hallucinations in MLLMs from a novel perspective of representation learning. We first analyzed the representation distribution of textual and visual tokens in MLLM, revealing two important findings: 1) there is a significant gap between textual and visual representations, indicating unsatisfactory cross-modal representation alignment; 2) representations of texts that contain and do not contain hallucinations are entangled, making it challenging to distinguish them. These two observations inspire us with a simple yet effective method to mitigate hallucinations. Specifically, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

x-plug/mplug-halowl
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBrain Tumor Detection and Classification · Mental Health via Writing · Seismology and Earthquake Studies

MethodsContrastive Learning