LithoBench: Benchmarking Large Multimodal Models for Remote-Sensing Lithology Interpretation

Jun Wang; Fengpeng Li; Hang Dong; Tianjin Huang; Wei Han

arXiv:2605.07640·cs.CV·May 11, 2026

LithoBench: Benchmarking Large Multimodal Models for Remote-Sensing Lithology Interpretation

Jun Wang, Fengpeng Li, Hang Dong, Tianjin Huang, Wei Han

PDF

TL;DR

LithoBench is a comprehensive benchmark designed to evaluate large multimodal models' ability to interpret remote sensing lithology, incorporating expert annotations and multi-level geological understanding.

Contribution

The paper introduces LithoBench, a new multi-level benchmark with expert-annotated data and a semi-automated construction pipeline for assessing geological semantic understanding in models.

Findings

01

Large vision-language models show significant limitations in geological understanding.

02

Higher-order reasoning tasks remain challenging for current models.

03

LithoBench provides a structured evaluation across five cognitive levels.

Abstract

Remote sensing lithology interpretation is fundamental to geological surveys, mineral exploration, and regional geological mapping. Unlike general land-cover recognition, lithology interpretation is a knowledge-intensive task that requires experts to infer rock types from various features, e.g., subtle visual, spectral, textural, geomorphological, and contextual cues, making reliable automated interpretation highly challenging. Geological knowledge-guided large multimodal models offer new opportunities, yet their evaluation remains constrained by the lack of benchmarks that capture lithological annotations, multi-level geological semantics, and expert-informed assessment. Here, we propose LithoBench, a multi-level benchmark for evaluating geological semantic understanding in remote sensing lithology interpretation. LithoBench contains 10,000 expert-annotated interpretation instances…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.