UHR-Micro: Diagnosing and Mitigating the Resolution Illusion in Earth Observation VLMs

Shuo Ni; Tong Wang; Jing Zhang; He Chen; Haonan Guo; Ning Zhang; Bo Du

arXiv:2605.12237·cs.CV·May 13, 2026

UHR-Micro: Diagnosing and Mitigating the Resolution Illusion in Earth Observation VLMs

Shuo Ni, Tong Wang, Jing Zhang, He Chen, Haonan Guo, Ning Zhang, Bo Du

PDF

1 Repo

TL;DR

This paper introduces UHR-Micro, a benchmark and diagnostic platform for evaluating and improving high-resolution Earth observation vision-language models, addressing the challenge of the resolution illusion where higher resolution does not guarantee better micro-scale perception.

Contribution

The paper presents UHR-Micro, a comprehensive benchmark for micro-level reasoning in Earth observation VLMs, and proposes MAP, an active perception agent that enhances micro-evidence grounding.

Findings

01

High-resolution VLMs often fail in spatial grounding despite detailed inputs.

02

Increasing model capacity does not fully resolve micro-evidence perception issues.

03

MAP improves micro-level perception by actively seeking and grounding evidence.

Abstract

Vision-Language Models (VLMs) increasingly operate on ultra-high-resolution (UHR) Earth observation imagery, yet they remain vulnerable to a severe scale mismatch between large-scale scene context and micro-scale targets. We refer to this empirical gap as a "resolution illusion": higher input resolution provides the appearance of richer visual detail, but does not necessarily yield reliable perception of spatially small, task-relevant evidence. To benchmark this challenge, we introduce UHR-Micro, a benchmark comprising 11,253 instructions grounded in 1,212 UHR images, designed to evaluate VLMs at the spatial limits of native Earth observation imagery. UHR-Micro spans diverse micro-target scales, context requirements, task families, and visual conditions, and provides diagnostic annotations that support controlled evaluation and fine-grained error attribution. Experiments with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

MiliLab/UHR-Micro
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.