HallE-Control: Controlling Object Hallucination in Large Multimodal Models
Bohan Zhai, Shijia Yang, Chenfeng Xu, Sheng Shen, Kurt Keutzer,, Chunyuan Li, Manling Li

TL;DR
This paper introduces HallE-Control, a method to reduce object hallucination in large multimodal models by controlling the use of contextual and parametric knowledge, improving caption accuracy without losing coverage.
Contribution
It proposes a novel controllable LMM, HallE-Control, that manages hallucination levels in object captioning by leveraging different knowledge sources.
Findings
HallE-Control reduces hallucination by 44% compared to LLaVA_7B.
The proposed evaluation method, CCEval, reveals existing models' susceptibility to object hallucination.
Controlling knowledge sources maintains object coverage while reducing hallucination.
Abstract
Current Large Multimodal Models (LMMs) achieve remarkable progress, yet there remains significant uncertainty regarding their ability to accurately apprehend visual details, that is, in performing detailed captioning. To address this, we introduce , a GPT-4 assisted evaluation method for detailed captioning. Interestingly, while LMMs demonstrate minimal object existence hallucination in existing VQA benchmarks, our proposed evaluation reveals continued susceptibility to such hallucinations. In this paper, we make the first attempt to investigate such hallucination from different aspects, including image resolution, the language decoder size, and instruction data amount, quality, granularity. Our findings underscore the unwarranted inference when the language description includes details at a finer object granularity than what the vision module can ground or verify, thus…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Explainable Artificial Intelligence (XAI)
MethodsMulti-Head Attention · Attention Is All You Need · Dropout · Dense Connections · Linear Layer · Label Smoothing · Adam · Absolute Position Encodings · Residual Connection · Layer Normalization
