Large Multi-modal Model Cartographic Map Comprehension for Textual Locality Georeferencing

Kalana Wijegunarathna; Kristin Stock; Christopher B. Jones

arXiv:2507.08575·cs.AI·July 14, 2025

Large Multi-modal Model Cartographic Map Comprehension for Textual Locality Georeferencing

Kalana Wijegunarathna, Kristin Stock, Christopher B. Jones

PDF

TL;DR

This paper introduces a novel multi-modal approach using Large Multi-Modal Models to improve georeferencing of biological sample locations by visually contextualizing spatial relations, achieving around 1 km accuracy.

Contribution

It presents a new zero-shot multi-modal method leveraging maps within LMMs for georeferencing, outperforming existing uni-modal and traditional tools.

Findings

01

Achieved approximately 1 km average distance error

02

Demonstrated the effectiveness of visual contextualization in georeferencing

03

Proposed a practical framework for integrating the method into workflows

Abstract

Millions of biological sample records collected in the last few centuries archived in natural history collections are un-georeferenced. Georeferencing complex locality descriptions associated with these collection samples is a highly labour-intensive task collection agencies struggle with. None of the existing automated methods exploit maps that are an essential tool for georeferencing complex relations. We present preliminary experiments and results of a novel method that exploits multi-modal capabilities of recent Large Multi-Modal Models (LMM). This method enables the model to visually contextualize spatial relations it reads in the locality description. We use a grid-based approach to adapt these auto-regressive models for this task in a zero-shot setting. Our experiments conducted on a small manually annotated dataset show impressive results for our approach ( $\sim$ 1 km Average…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.