CATCH: Complementary Adaptive Token-level Contrastive Decoding to   Mitigate Hallucinations in LVLMs

Zhehan Kan; Ce Zhang; Zihan Liao; Yapeng Tian; Wenming Yang; Junyuan; Xiao; Xu Li; Dongmei Jiang; Yaowei Wang; Qingmin Liao

arXiv:2411.12713·cs.CV·November 20, 2024

CATCH: Complementary Adaptive Token-level Contrastive Decoding to Mitigate Hallucinations in LVLMs

Zhehan Kan, Ce Zhang, Zihan Liao, Yapeng Tian, Wenming Yang, Junyuan, Xiao, Xu Li, Dongmei Jiang, Yaowei Wang, Qingmin Liao

PDF

Open Access

TL;DR

CATCH is a novel decoding method designed to reduce hallucinations in LVLMs by separating visual information, detecting hallucinations, and adaptively correcting token-level outputs, thereby improving reliability in critical applications.

Contribution

The paper introduces CATCH, a comprehensive approach combining visual decoupling, hallucination detection, and adaptive decoding to mitigate hallucinations in LVLMs without additional training.

Findings

01

Effective hallucination reduction in LVLMs across multiple tasks.

02

Robust generalization to new tasks without extra training.

03

Applicable without specific data or prior knowledge.

Abstract

Large Vision-Language Model (LVLM) systems have demonstrated impressive vision-language reasoning capabilities but suffer from pervasive and severe hallucination issues, posing significant risks in critical domains such as healthcare and autonomous systems. Despite previous efforts to mitigate hallucinations, a persistent issue remains: visual defect from vision-language misalignment, creating a bottleneck in visual processing capacity. To address this challenge, we develop Complementary Adaptive Token-level Contrastive Decoding to Mitigate Hallucinations in LVLMs (CATCH), based on the Information Bottleneck theory. CATCH introduces Complementary Visual Decoupling (CVD) for visual information separation, Non-Visual Screening (NVS) for hallucination detection, and Adaptive Token-level Contrastive Decoding (ATCD) for hallucination mitigation. CATCH addresses issues related to visual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeurological disorders and treatments · Topological and Geometric Data Analysis · Digital Image Processing Techniques