FINER: MLLMs Hallucinate under Fine-grained Negative Queries

Rui Xiao; Sanghwan Kim; Yongqin Xian; Zeynep Akata; Stephan Alaniz

arXiv:2603.17662·cs.CV·March 19, 2026

FINER: MLLMs Hallucinate under Fine-grained Negative Queries

Rui Xiao, Sanghwan Kim, Yongqin Xian, Zeynep Akata, Stephan Alaniz

PDF

Open Access

TL;DR

This paper introduces FINER, a set of benchmarks and a tuning method to analyze and reduce hallucinations in multimodal large language models when handling fine-grained queries, improving their accuracy and robustness.

Contribution

The paper presents FINER benchmarks and a fine-tuning approach using DPO to significantly reduce hallucinations in MLLMs on fine-grained queries.

Findings

01

Finetuning with FINER-Tuning reduces hallucinations by up to 24.2%.

02

Benchmarks reveal hallucinations occur with fine-grained mismatches and present elements.

03

Finetuning improves performance across multiple existing hallucination benchmarks.

Abstract

Multimodal large language models (MLLMs) struggle with hallucinations, particularly with fine-grained queries, a challenge underrepresented by existing benchmarks that focus on coarse image-related questions. We introduce FIne-grained NEgative queRies (FINER), alongside two benchmarks: FINER-CompreCap and FINER-DOCCI. Using FINER, we analyze hallucinations across four settings: multi-object, multi-attribute, multi-relation, and ``what'' questions. Our benchmarks reveal that MLLMs hallucinate when fine-grained mismatches co-occur with genuinely present elements in the image. To address this, we propose FINER-Tuning, leveraging Direct Preference Optimization (DPO) on FINER-inspired data. Finetuning four frontier MLLMs with FINER-Tuning yields up to 24.2\% gains (InternVL3.5-14B) on hallucinations from our benchmarks, while simultaneously improving performance on eight existing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Graph Neural Networks · Adversarial Robustness in Machine Learning