Georeferencing complex relative locality descriptions with large language models

Aneesha Fernando; Surangika Ranathunga; Kristin Stock; Raj Prasanna; Christopher B. Jones

arXiv:2512.14228·cs.AI·January 26, 2026

Georeferencing complex relative locality descriptions with large language models

Aneesha Fernando, Surangika Ranathunga, Kristin Stock, Raj Prasanna, Christopher B. Jones

PDF

TL;DR

This paper demonstrates that fine-tuned Large Language Models can automatically georeference complex locality descriptions in biodiversity records with high accuracy, outperforming existing methods especially for lengthy and intricate texts.

Contribution

The study introduces a novel approach using fine-tuned LLMs with QLoRA for georeferencing complex locality descriptions, achieving significant accuracy improvements over baselines.

Findings

01

65% of records within 10 km radius on average across datasets

02

85% of records within 10 km in New York state

03

67% of records within 1 km in the best case

Abstract

Georeferencing text documents has typically relied on either gazetteer-based methods to assign geographic coordinates to place names, or on language modelling approaches that associate textual terms with geographic locations. However, many location descriptions specify positions relatively with spatial relationships, making geocoding based solely on place names or geo-indicative words inaccurate. This issue frequently arises in biological specimen collection records, where locations are often described through narratives rather than coordinates if they pre-date GPS. Accurate georeferencing is vital for biodiversity studies, yet the process remains labour-intensive, leading to a demand for automated georeferencing solutions. This paper explores the potential of Large Language Models (LLMs) to georeference complex locality descriptions automatically, focusing on the biodiversity…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.