Seeing Clearly without Training: Mitigating Hallucinations in Multimodal LLMs for Remote Sensing

Yi Liu; Jing Zhang; Di Wang; Xiaoyu Tian; Haonan Guo; Bo Du

arXiv:2603.02754·cs.CV·March 4, 2026

Seeing Clearly without Training: Mitigating Hallucinations in Multimodal LLMs for Remote Sensing

Yi Liu, Jing Zhang, Di Wang, Xiaoyu Tian, Haonan Guo, Bo Du

PDF

Open Access

TL;DR

This paper introduces RSHBench, a benchmark for diagnosing hallucinations in multimodal LLMs for remote sensing, and proposes RADAR, a training-free method to reduce hallucinations during inference, improving accuracy and reliability.

Contribution

The paper presents RSHBench for systematic diagnosis and RADAR for inference-time hallucination mitigation in multimodal LLMs for remote sensing.

Findings

01

RADAR reduces hallucinations in MLLMs during remote sensing tasks.

02

RADAR improves accuracy in remote sensing visual question-answering.

03

Extensive experiments validate RADAR's effectiveness across diverse models.

Abstract

Multimodal large language models (MLLMs) suffer from pronounced hallucinations in remote sensing visual question-answering (RS-VQA), primarily caused by visual grounding failures in large-scale scenes or misinterpretation of fine-grained small targets. To systematically analyze these issues, we introduce RSHBench, a protocol-based benchmark for fine-grained diagnosis of factual and logical hallucinations. To mitigate grounding-induced factual hallucinations, we further propose Relative Attention-Driven Actively Reasoning (RADAR), a training-free inference method that leverages intrinsic attention in MLLMs to guide progressive localization and fine-grained local reasoning at test time. Extensive experiments across diverse MLLMs demonstrate that RADAR consistently improves RS-VQA performance and reduces both factual and logical hallucinations. Code and data will be publicly available at:…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Adversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning