Large Multi-modal Model Cartographic Map Comprehension for Textual Locality Georeferencing
Kalana Wijegunarathna, Kristin Stock, Christopher B. Jones

TL;DR
This paper introduces a novel multi-modal approach using Large Multi-Modal Models to improve georeferencing of biological sample locations by visually contextualizing spatial relations, achieving around 1 km accuracy.
Contribution
It presents a new zero-shot multi-modal method leveraging maps within LMMs for georeferencing, outperforming existing uni-modal and traditional tools.
Findings
Achieved approximately 1 km average distance error
Demonstrated the effectiveness of visual contextualization in georeferencing
Proposed a practical framework for integrating the method into workflows
Abstract
Millions of biological sample records collected in the last few centuries archived in natural history collections are un-georeferenced. Georeferencing complex locality descriptions associated with these collection samples is a highly labour-intensive task collection agencies struggle with. None of the existing automated methods exploit maps that are an essential tool for georeferencing complex relations. We present preliminary experiments and results of a novel method that exploits multi-modal capabilities of recent Large Multi-Modal Models (LMM). This method enables the model to visually contextualize spatial relations it reads in the locality description. We use a grid-based approach to adapt these auto-regressive models for this task in a zero-shot setting. Our experiments conducted on a small manually annotated dataset show impressive results for our approach (1 km Average…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
