GS-QA: A Benchmark for Geospatial Question Answering
Majid Saeedan, Muhammad Shihab Rashid, Ahmed Eldawy, Vagelis Hristidis

TL;DR
GS-QA is a comprehensive, extensible benchmark with 2,800 geospatial questions designed to evaluate LLMs' ability to handle complex spatial reasoning, multi-source data integration, and diverse answer types.
Contribution
This work introduces GS-QA, a novel large-scale geospatial QA benchmark with diverse question templates, multi-source reasoning, and a comprehensive evaluation methodology, addressing limitations of prior benchmarks.
Findings
LLMs perform well on simple spatial questions but struggle with complex predicates.
Accuracy drops significantly for questions requiring multi-source reasoning.
Existing solutions highlight the need for further research in geospatial QA.
Abstract
Recent advances in Large Language Models (LLMs) have led to dramatic improvements in question answering (QA). To address the challenge of evaluating QA systems, standardized benchmarks have been introduced. This work focuses on the problem of geospatial QA, where a large collection of geospatial data is available in the form of a spatial database or other forms. Existing work on geospatial QA benchmarks has various limitations, including a small number of questions, limited spatial predicates, narrow output types, and no multi-source reasoning. We present GS-QA, an extensible geospatial QA benchmark with 2,800 question-answer pairs across 28 templates on top of OpenStreetMap and Wikipedia data, covering a wide range of spatial objects, predicates (including directional and towards filtering), and answer types (entity names, locations, distances, directions, counts, and aggregated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
