The mapKurator System: A Complete Pipeline for Extracting and Linking Text from Historical Maps
Jina Kim, Zekun Li, Yijun Lin, Min Namgung, Leeje Jang, Yao-Yi Chiang

TL;DR
The mapKurator system provides an end-to-end pipeline for extracting, linking, and processing text from large historical map scans, facilitating data analysis and improving accessibility of geographic information.
Contribution
It introduces a comprehensive system that automates extraction and linkage of text from large historical maps, integrating machine learning with GIS-compatible output.
Findings
Processed over 60,000 maps and 100 million text labels.
Enabled integration with a collaborative web platform.
Improved accessibility and reusability of historical map data.
Abstract
Scanned historical maps in libraries and archives are valuable repositories of geographic data that often do not exist elsewhere. Despite the potential of machine learning tools like the Google Vision APIs for automatically transcribing text from these maps into machine-readable formats, they do not work well with large-sized images (e.g., high-resolution scanned documents), cannot infer the relation between the recognized text and other datasets, and are challenging to integrate with post-processing tools. This paper introduces the mapKurator system, an end-to-end system integrating machine learning models with a comprehensive data processing pipeline. mapKurator empowers automated extraction, post-processing, and linkage of text labels from large numbers of large-dimension historical map scans. The output data, comprising bounding polygons and recognized text, is in the standard…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGeographic Information Systems Studies · Web Data Mining and Analysis · Advanced Image and Video Retrieval Techniques
MethodsFocus
