LithoBench: Benchmarking Large Multimodal Models for Remote-Sensing Lithology Interpretation
Jun Wang, Fengpeng Li, Hang Dong, Tianjin Huang, Wei Han

TL;DR
LithoBench is a comprehensive benchmark designed to evaluate large multimodal models' ability to interpret remote sensing lithology, incorporating expert annotations and multi-level geological understanding.
Contribution
The paper introduces LithoBench, a new multi-level benchmark with expert-annotated data and a semi-automated construction pipeline for assessing geological semantic understanding in models.
Findings
Large vision-language models show significant limitations in geological understanding.
Higher-order reasoning tasks remain challenging for current models.
LithoBench provides a structured evaluation across five cognitive levels.
Abstract
Remote sensing lithology interpretation is fundamental to geological surveys, mineral exploration, and regional geological mapping. Unlike general land-cover recognition, lithology interpretation is a knowledge-intensive task that requires experts to infer rock types from various features, e.g., subtle visual, spectral, textural, geomorphological, and contextual cues, making reliable automated interpretation highly challenging. Geological knowledge-guided large multimodal models offer new opportunities, yet their evaluation remains constrained by the lack of benchmarks that capture lithological annotations, multi-level geological semantics, and expert-informed assessment. Here, we propose LithoBench, a multi-level benchmark for evaluating geological semantic understanding in remote sensing lithology interpretation. LithoBench contains 10,000 expert-annotated interpretation instances…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
