Evaluating Tool-Augmented Agents in Remote Sensing Platforms
Simranjit Singh, Michael Fore, Dimitrios Stamoulis

TL;DR
This paper introduces GeoLLM-QA, a new benchmark for evaluating tool-augmented LLMs in remote sensing, focusing on realistic, user-grounded tasks involving visual, verbal, and interactive actions on a live UI platform.
Contribution
It presents GeoLLM-QA, a benchmark that better reflects real-world remote sensing tasks by incorporating interactive, multi-modal sequences and evaluates state-of-the-art LLMs on 1,000 diverse tasks.
Findings
State-of-the-art LLMs show limited performance on realistic RS tasks.
The benchmark reveals gaps in current LLM capabilities for complex, interactive remote sensing scenarios.
Insights suggest directions for developing more effective RS agents.
Abstract
Tool-augmented Large Language Models (LLMs) have shown impressive capabilities in remote sensing (RS) applications. However, existing benchmarks assume question-answering input templates over predefined image-text data pairs. These standalone instructions neglect the intricacies of realistic user-grounded tasks. Consider a geospatial analyst: they zoom in a map area, they draw a region over which to collect satellite imagery, and they succinctly ask "Detect all objects here". Where is `here`, if it is not explicitly hardcoded in the image-text template, but instead is implied by the system state, e.g., the live map positioning? To bridge this gap, we present GeoLLM-QA, a benchmark designed to capture long sequences of verbal, visual, and click-based actions on a real UI platform. Through in-depth evaluation of state-of-the-art LLMs over a diverse set of 1,000 tasks, we offer insights…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Geographic Information Systems Studies · Multi-Agent Systems and Negotiation
MethodsSparse Evolutionary Training
