HallE-Control: Controlling Object Hallucination in Large Multimodal   Models

Bohan Zhai; Shijia Yang; Chenfeng Xu; Sheng Shen; Kurt Keutzer,; Chunyuan Li; Manling Li

arXiv:2310.01779·cs.CV·April 1, 2024·1 cites

HallE-Control: Controlling Object Hallucination in Large Multimodal Models

Bohan Zhai, Shijia Yang, Chenfeng Xu, Sheng Shen, Kurt Keutzer,, Chunyuan Li, Manling Li

PDF

Open Access 2 Repos

TL;DR

This paper introduces HallE-Control, a method to reduce object hallucination in large multimodal models by controlling the use of contextual and parametric knowledge, improving caption accuracy without losing coverage.

Contribution

It proposes a novel controllable LMM, HallE-Control, that manages hallucination levels in object captioning by leveraging different knowledge sources.

Findings

01

HallE-Control reduces hallucination by 44% compared to LLaVA_7B.

02

The proposed evaluation method, CCEval, reveals existing models' susceptibility to object hallucination.

03

Controlling knowledge sources maintains object coverage while reducing hallucination.

Abstract

Current Large Multimodal Models (LMMs) achieve remarkable progress, yet there remains significant uncertainty regarding their ability to accurately apprehend visual details, that is, in performing detailed captioning. To address this, we introduce $CCEval$ , a GPT-4 assisted evaluation method for detailed captioning. Interestingly, while LMMs demonstrate minimal object existence hallucination in existing VQA benchmarks, our proposed evaluation reveals continued susceptibility to such hallucinations. In this paper, we make the first attempt to investigate such hallucination from different aspects, including image resolution, the language decoder size, and instruction data amount, quality, granularity. Our findings underscore the unwarranted inference when the language description includes details at a finer object granularity than what the vision module can ground or verify, thus…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Explainable Artificial Intelligence (XAI)

MethodsMulti-Head Attention · Attention Is All You Need · Dropout · Dense Connections · Linear Layer · Label Smoothing · Adam · Absolute Position Encodings · Residual Connection · Layer Normalization