Benchmarking Large Language Models for Geolocating Colonial Virginia Land Grants
Ryan Mioduski

TL;DR
This paper evaluates large language models' ability to convert historical land grant descriptions into accurate geographic coordinates, demonstrating their potential for scalable and cost-effective historical georeferencing.
Contribution
It introduces a new benchmark dataset and systematically compares LLMs against traditional geocoding methods for historical land records.
Findings
Top LLM achieved 23 km mean error, outperforming baselines.
Ensemble methods further reduced error to 19.2 km.
Cost-effective models maintained high accuracy at low cost.
Abstract
Virginia's seventeenth- and eighteenth-century land patents survive primarily as narrative metes-and-bounds descriptions, limiting spatial analysis. This study systematically evaluates current-generation large language models (LLMs) in converting these prose abstracts into geographically accurate latitude/longitude coordinates within a focused evaluation context. A digitized corpus of 5,471 Virginia patent abstracts (1695-1732) is released, with 43 rigorously verified test cases serving as an initial, geographically focused benchmark. Six OpenAI models across three architectures-o-series, GPT-4-class, and GPT-3.5-were tested under two paradigms: direct-to-coordinate and tool-augmented chain-of-thought invoking external geocoding APIs. Results were compared against a GIS analyst baseline, Stanford NER geoparser, Mordecai-3 neural geoparser, and a county-centroid heuristic. The top…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
