LIGHT: Multi-Modal Text Linking on Historical Maps
Yijun Lin, Rhett Olson, Junhan Wu, Yao-Yi Chiang, Jerod Weinman

TL;DR
LIGHT is a multi-modal approach that effectively links text fragments on historical maps by integrating linguistic, visual, and geometric features, outperforming existing methods in map text understanding.
Contribution
The paper introduces LIGHT, a novel multi-modal model that combines geometric, visual, and linguistic features for improved text linking on historical maps, addressing limitations of prior layout analysis methods.
Findings
LIGHT outperforms existing methods on MapText datasets.
The geometric-aware embedding improves spatial understanding of map text.
Multi-modal learning enhances text linking accuracy in complex map layouts.
Abstract
Text on historical maps provides valuable information for studies in history, economics, geography, and other related fields. Unlike structured or semi-structured documents, text on maps varies significantly in orientation, reading order, shape, and placement. Many modern methods can detect and transcribe text regions, but they struggle to effectively ``link'' the recognized text fragments, e.g., determining a multi-word place name. Existing layout analysis methods model word relationships to improve text understanding in structured documents, but they primarily rely on linguistic features and neglect geometric information, which is essential for handling map text. To address these challenges, we propose LIGHT, a novel multi-modal approach that integrates linguistic, image, and geometric features for linking text on historical maps. In particular, LIGHT includes a geometry-aware…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGeographic Information Systems Studies
