Magnifier Prompt: Tackling Multimodal Hallucination via Extremely Simple   Instructions

Yuhan Fu; Ruobing Xie; Jiazhen Liu; Bangxiang Lan; Xingwu Sun; Zhanhui; Kang; Xirong Li

arXiv:2410.11701·cs.CL·February 24, 2025

Magnifier Prompt: Tackling Multimodal Hallucination via Extremely Simple Instructions

Yuhan Fu, Ruobing Xie, Jiazhen Liu, Bangxiang Lan, Xingwu Sun, Zhanhui, Kang, Xirong Li

PDF

Open Access

TL;DR

This paper introduces MagPrompt, a simple, training-free prompt method that reduces hallucinations in multimodal large language models by emphasizing image focus and conflict resolution, improving robustness and performance.

Contribution

The paper proposes MagPrompt, a novel, simple prompt design principle that effectively mitigates hallucinations in MLLMs without additional training or complex methods.

Findings

01

MagPrompt reduces hallucinations across multiple datasets.

02

It outperforms or matches complex methods like VCD.

03

Applicable to both open-source and closed-source models.

Abstract

Hallucinations in multimodal large language models (MLLMs) hinder their practical applications. To address this, we propose a Magnifier Prompt (MagPrompt), a simple yet effective method to tackle hallucinations in MLLMs via extremely simple instructions. MagPrompt is based on the following two key principles, which guide the design of various effective prompts, demonstrating robustness: (1) MLLMs should focus more on the image. (2) When there are conflicts between the image and the model's inner knowledge, MLLMs should prioritize the image. MagPrompt is training-free and can be applied to open-source and closed-source models, such as GPT-4o and Gemini-pro. It performs well across many datasets and its effectiveness is comparable or even better than more complex methods like VCD. Furthermore, our prompt design principles and experimental analyses provide valuable insights into multimodal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHallucinations in medical conditions

MethodsFocus