Subnational Geocoding of Global Disasters Using Large Language Models
Michele Ronco, Damien Delforge, Wiebke S. J\"ager, Christina Corbane

TL;DR
This paper introduces an automated workflow using GPT-4o to accurately geocode disaster events' locations from unstructured text, integrating multiple geoinformation sources for reliable subnational mapping, applicable to large disaster datasets.
Contribution
The paper presents a novel fully automated LLM-assisted method for geocoding disaster locations that requires no manual intervention and cross-verifies multiple geographic sources.
Findings
Geocoded 14,215 disaster events from 2000 to 2024.
Achieved high reliability scores by cross-checking GADM, OpenStreetMap, and Wikidata.
Demonstrated scalability and applicability across all disaster types.
Abstract
Subnational location data of disaster events are critical for risk assessment and disaster risk reduction. Disaster databases such as EM-DAT often report locations in unstructured textual form, with inconsistent granularity or spelling, that make it difficult to integrate with spatial datasets. We present a fully automated LLM-assisted workflow that processes and cleans textual location information using GPT-4o, and assigns geometries by cross-checking three independent geoinformation repositories: GADM, OpenStreetMap and Wikidata. Based on the agreement and availability of these sources, we assign a reliability score to each location while generating subnational geometries. Applied to the EM-DAT dataset from 2000 to 2024, the workflow geocodes 14,215 events across 17,948 unique locations. Unlike previous methods, our approach requires no manual intervention, covers all disaster types,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGeographic Information Systems Studies · Data-Driven Disease Surveillance · Public Relations and Crisis Communication
