Piculet: Specialized Models-Guided Hallucination Decrease for MultiModal   Large Language Models

Kohou Wang; Xiang Liu; Zhaoxiang Liu; Kai Wang; Shiguo Lian

arXiv:2408.01003·cs.AI·August 5, 2024

Piculet: Specialized Models-Guided Hallucination Decrease for MultiModal Large Language Models

Kohou Wang, Xiang Liu, Zhaoxiang Liu, Kai Wang, Shiguo Lian

PDF

Open Access

TL;DR

Piculet is a training-free approach that reduces hallucinations in multimodal large language models by using specialized models to extract and incorporate detailed visual descriptions into the input, improving alignment without retraining.

Contribution

Introducing Piculet, a universal, training-free method that enhances MLLMs' input with visual descriptions from specialized models to decrease hallucinations.

Findings

01

Significantly reduces hallucinations in MLLMs.

02

Effective across different MLLMs without retraining.

03

Improves alignment between image content and generated text.

Abstract

Multimodal Large Language Models (MLLMs) have made significant progress in bridging the gap between visual and language modalities. However, hallucinations in MLLMs, where the generated text does not align with image content, continue to be a major challenge. Existing methods for addressing hallucinations often rely on instruction-tuning, which requires retraining the model with specific data, which increases the cost of utilizing MLLMs further. In this paper, we introduce a novel training-free method, named Piculet, for enhancing the input representation of MLLMs. Piculet leverages multiple specialized models to extract descriptions of visual information from the input image and combine these descriptions with the original image and query as input to the MLLM. We evaluate our method both quantitively and qualitatively, and the results demonstrate that Piculet greatly decreases…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling

MethodsALIGN