GeoR-Bench: Evaluating Geoscience Visual Reasoning

Yushuo Zheng; Zicheng Zhang; Huiyu Duan; Chunyi Li; Zijian Chen; Ziheng Jia; Yue Shi; Ke Gu; Xiongkuo Min; Guangtao Zhai

arXiv:2605.11541·cs.CV·May 13, 2026

GeoR-Bench: Evaluating Geoscience Visual Reasoning

Yushuo Zheng, Zicheng Zhang, Huiyu Duan, Chunyi Li, Zijian Chen, Ziheng Jia, Yue Shi, Ke Gu, Xiongkuo Min, Guangtao Zhai

PDF

TL;DR

GeoR-Bench introduces a comprehensive benchmark for evaluating AI's ability to perform reasoning tasks in geoscience visual data, highlighting current models' limitations in understanding earth science processes.

Contribution

The paper presents GeoR-Bench, a new benchmark with diverse geoscience tasks and evaluation criteria, to assess and improve AI reasoning in geoscience applications.

Findings

01

Current models achieve low accuracy, with the best at 42.7% strict accuracy.

02

Visual quality often exceeds scientific reasoning accuracy.

03

Geoscience reasoning remains a significant challenge for AI models.

Abstract

Geoscience intelligence is expected to understand, reason about, and predict earth system changes to support human decision-making in critical domains such as disaster response, climate adaptation and environmental protection. Although current research has shown promising progress on specific geoscience tasks, such as remote sensing interpretation, geographic question-answering, existing benchmarks remain largely task-specific which failing to capture the open-ended real world geoscience problems. As a result, it remains unclear how far current AI systems are from achieving genuine geoscience intelligence. To address this gap, we present \textbf{GeoR-Bench}, a \underline{Bench}mark for evaluating \underline{Geo}science visual \underline{R}easoning through reasoning informed visual editing tasks. GeoR-Bench contains 440 curated samples spanning 6 geoscience categories and 24 task types,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.