Understanding Multimodal Hallucination with Parameter-Free Representation Alignment
Yueqian Wang, Jianxin Liang, Yuxuan Wang, Huishuai Zhang, Dongyan Zhao

TL;DR
This paper introduces Pfram, a parameter-free metric to analyze image representations in multimodal models, revealing factors contributing to hallucinations and guiding improvements in model alignment with human perception.
Contribution
We propose Pfram, a novel parameter-free metric for assessing image representation alignment, enabling analysis of hallucination causes in multimodal large language models.
Findings
Pfram correlates strongly with object hallucination across models.
Different modules and instructions impact image representation quality.
Alternative visual encoders can improve model alignment.
Abstract
Hallucination is a common issue in Multimodal Large Language Models (MLLMs), yet the underlying principles remain poorly understood. In this paper, we investigate which components of MLLMs contribute to object hallucinations. To analyze image representations while completely avoiding the influence of all other factors other than the image representation itself, we propose a parametric-free representation alignment metric (Pfram) that can measure the similarities between any two representation systems without requiring additional training parameters. Notably, Pfram can also assess the alignment of a neural representation system with the human representation system, represented by ground-truth annotations of images. By evaluating the alignment with object annotations, we demonstrate that this metric shows strong and consistent correlations with object hallucination across a wide range of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMental Health and Psychiatry · Hallucinations in medical conditions · Complex Systems and Time Series Analysis
