Georeferencing complex relative locality descriptions with large language models
Aneesha Fernando, Surangika Ranathunga, Kristin Stock, Raj Prasanna, Christopher B. Jones

TL;DR
This paper demonstrates that fine-tuned Large Language Models can automatically georeference complex locality descriptions in biodiversity records with high accuracy, outperforming existing methods especially for lengthy and intricate texts.
Contribution
The study introduces a novel approach using fine-tuned LLMs with QLoRA for georeferencing complex locality descriptions, achieving significant accuracy improvements over baselines.
Findings
65% of records within 10 km radius on average across datasets
85% of records within 10 km in New York state
67% of records within 1 km in the best case
Abstract
Georeferencing text documents has typically relied on either gazetteer-based methods to assign geographic coordinates to place names, or on language modelling approaches that associate textual terms with geographic locations. However, many location descriptions specify positions relatively with spatial relationships, making geocoding based solely on place names or geo-indicative words inaccurate. This issue frequently arises in biological specimen collection records, where locations are often described through narratives rather than coordinates if they pre-date GPS. Accurate georeferencing is vital for biodiversity studies, yet the process remains labour-intensive, leading to a demand for automated georeferencing solutions. This paper explores the potential of Large Language Models (LLMs) to georeference complex locality descriptions automatically, focusing on the biodiversity…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
