Hallucination Augmented Contrastive Learning for Multimodal Large Language Model
Chaoya Jiang, Haiyang Xu, Mengfan Dong, Jiaxing Chen, Wei Ye, Ming, Yan, Qinghao Ye, Ji Zhang, Fei Huang, Shikun Zhang

TL;DR
This paper introduces a contrastive learning approach to reduce hallucinations in multimodal large language models by improving cross-modal representation alignment and distinguishing hallucination-containing texts.
Contribution
It proposes a novel contrastive learning method that uses hallucination-containing texts as hard negatives to enhance MLLMs' accuracy and reduce erroneous outputs.
Findings
Significant reduction in hallucination occurrences.
34.66% /29.5% improvement on MMhal-Bench.
Enhanced cross-modal representation alignment.
Abstract
Multi-modal large language models (MLLMs) have been shown to efficiently integrate natural language with visual information to handle multi-modal tasks. However, MLLMs still face a fundamental limitation of hallucinations, where they tend to generate erroneous or fabricated information. In this paper, we address hallucinations in MLLMs from a novel perspective of representation learning. We first analyzed the representation distribution of textual and visual tokens in MLLM, revealing two important findings: 1) there is a significant gap between textual and visual representations, indicating unsatisfactory cross-modal representation alignment; 2) representations of texts that contain and do not contain hallucinations are entangled, making it challenging to distinguish them. These two observations inspire us with a simple yet effective method to mitigate hallucinations. Specifically, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBrain Tumor Detection and Classification · Mental Health via Writing · Seismology and Earthquake Studies
MethodsContrastive Learning
