UR-Bench: A Benchmark for Multi-Hop Reasoning over Ultra-High-Resolution Images

Siqi Li; Xinyu Cai; Jianbiao Mei; Nianchen Deng; Pinlong Cai; Licheng Wen; Yufan Shen; Xuemeng Yang; Botian Shi; Yong Liu

arXiv:2601.08748·cs.CV·January 14, 2026

UR-Bench: A Benchmark for Multi-Hop Reasoning over Ultra-High-Resolution Images

Siqi Li, Xinyu Cai, Jianbiao Mei, Nianchen Deng, Pinlong Cai, Licheng Wen, Yufan Shen, Xuemeng Yang, Botian Shi, Yong Liu

PDF

Open Access

TL;DR

UR-Bench is a new benchmark designed to evaluate multimodal large language models' reasoning abilities on ultra-high-resolution images, addressing a gap in visual reasoning evaluation for complex visual data.

Contribution

We introduce UR-Bench, a comprehensive ultra-high-resolution image reasoning benchmark, along with an agent-based framework and tools for efficient processing, advancing visual reasoning evaluation.

Findings

01

State-of-the-art models show limited reasoning on ultra-high-resolution images.

02

Our framework improves reasoning efficiency and accuracy on ultra-high-resolution data.

03

UR-Bench enables detailed evaluation of visual reasoning capabilities.

Abstract

Recent multimodal large language models (MLLMs) show strong capabilities in visual-language reasoning, yet their performance on ultra-high-resolution imagery remains largely unexplored. Existing visual question answering (VQA) benchmarks typically rely on medium-resolution data, offering limited visual complexity. To bridge this gap, we introduce Ultra-high-resolution Reasoning Benchmark (UR-Bench), a benchmark designed to evaluate the reasoning capabilities of MLLMs under extreme visual information. UR-Bench comprises two major categories, Humanistic Scenes and Natural Scenes, covering four subsets of ultra-high-resolution images with distinct spatial structures and data sources. Each subset contains images ranging from hundreds of megapixels to gigapixels, accompanied by questions organized into three levels, enabling evaluation of models' reasoning capabilities in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques