Self-supervised Video Instance Segmentation Can Boost Geographic Entity Alignment in Historical Maps
Xue Xia, Randall Balestriero, Tao Zhang, Lorenz Hurni

TL;DR
This paper introduces a self-supervised video instance segmentation approach to improve geographic entity alignment in historical maps, reducing manual annotation needs and enhancing accuracy in linking map features across datasets.
Contribution
The study presents a novel self-supervised training method for video instance segmentation applied to historical maps, including synthetic video generation for pretraining.
Findings
24.9% improvement in AP over baseline
0.23 increase in F1 score
Effective reduction in manual annotation requirements
Abstract
Tracking geographic entities from historical maps, such as buildings, offers valuable insights into cultural heritage, urbanization patterns, environmental changes, and various historical research endeavors. However, linking these entities across diverse maps remains a persistent challenge for researchers. Traditionally, this has been addressed through a two-step process: detecting entities within individual maps and then associating them via a heuristic-based post-processing step. In this paper, we propose a novel approach that combines segmentation and association of geographic entities in historical maps using video instance segmentation (VIS). This method significantly streamlines geographic entity alignment and enhances automation. However, acquiring high-quality, video-format training data for VIS models is prohibitively expensive, especially for historical maps that often contain…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques
