TL;DR
This paper analyzes geospatial web search queries at scale, revealing that most are transactional and practical, often outside traditional GIS scope, and provides a labeled dataset and taxonomy.
Contribution
It introduces a large-scale analysis of geospatial queries using embeddings and clustering, revealing new insights and releasing datasets for future research.
Findings
18.0% of queries are geospatial, nearly three times previous labels.
Transactional queries like costs and hours dominate, comprising 15.3%.
Many queries require real-time or generative systems beyond traditional GIS.
Abstract
Web search queries concern place far more often than existing labelling schemes suggest, yet the landscape of geospatial web search queries - what people ask of place, and how often - remains poorly characterised at scale. We apply dense sentence embeddings, a lightweight SetFit classifier, and density-based clustering to the full MS MARCO corpus of 1.01 million real Bing queries without prior filtering for toponyms or spatial keywords, identifying 181,827 geospatial queries (18.0%), nearly threefold the 6.17% labelled as Location in the original annotations. The resulting taxonomy of 88 query categories reveals that geospatial web search is dominated by transactional and practical lookups: costs and prices alone account for 15.3% of geospatial queries, nearly twice the size of the entire physical geography theme. Much of this activity - costs, opening hours, contact details, weather,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
