Volcano: Mitigating Multimodal Hallucination through Self-Feedback   Guided Revision

Seongyun Lee; Sue Hyun Park; Yongrae Jo; Minjoon Seo

arXiv:2311.07362·cs.CL·April 3, 2024·1 cites

Volcano: Mitigating Multimodal Hallucination through Self-Feedback Guided Revision

Seongyun Lee, Sue Hyun Park, Yongrae Jo, Minjoon Seo

PDF

Open Access 1 Repo 2 Models

TL;DR

Volcano introduces a self-feedback guided revision approach for multimodal models, significantly reducing hallucinations by generating and utilizing natural language feedback grounded on visual data, leading to state-of-the-art results.

Contribution

It proposes a novel self-feedback mechanism for multimodal models, enabling self-revision based on visual-grounded feedback to mitigate hallucinations.

Findings

01

Reduces multimodal hallucination effectively

02

Achieves state-of-the-art on multiple benchmarks

03

Improves general multimodal abilities

Abstract

Large multimodal models suffer from multimodal hallucination, where they provide incorrect responses misaligned with the given visual information. Recent works have conjectured that one of the reasons behind multimodal hallucination is due to the vision encoder failing to ground on the image properly. To mitigate this issue, we propose a novel approach that leverages self-feedback as visual cues. Building on this approach, we introduce Volcano, a multimodal self-feedback guided revision model. Volcano generates natural language feedback to its initial response based on the provided visual information and utilizes this feedback to self-revise its initial response. Volcano effectively reduces multimodal hallucination and achieves state-of-the-art on MMHal-Bench, POPE, and GAVIE. It also improves on general multimodal abilities and outperforms previous models on MM-Vet and MMBench. Through…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kaistai/volcano
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHallucinations in medical conditions