Stop learning it all to mitigate visual hallucination, Focus on the hallucination target
Dokyoon Yoon, Youngsook Song, Woomyong Park

TL;DR
This paper introduces \\mymethod, a preference learning approach that reduces hallucinations in multimodal large language models by focusing on hallucination targets, leading to more accurate and reliable responses.
Contribution
The paper presents a novel preference learning method that targets hallucination areas in MLLMs, utilizing a specialized dataset to improve factual accuracy without harming overall performance.
Findings
Significant reduction in hallucinations across multiple tasks
Enhanced model reliability and factual accuracy
Maintained overall performance levels
Abstract
Multimodal Large Language Models (MLLMs) frequently suffer from hallucination issues, generating information about objects that are not present in input images during vision-language tasks. These hallucinations particularly undermine model reliability in practical applications requiring accurate object identification. To address this challenge, we propose \mymethod,\ a preference learning approach that mitigates hallucinations by focusing on targeted areas where they occur. To implement this, we build a dataset containing hallucinated responses, correct responses, and target information (i.e., objects present in the images and the corresponding chunk positions in responses affected by hallucinations). By applying a preference learning method restricted to these specific targets, the model can filter out irrelevant signals and focus on correcting hallucinations. This allows the model to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHallucinations in medical conditions · Leprosy Research and Treatment · Drug-Induced Ocular Toxicity
