Hyper-Local Deformable Transformers for Text Spotting on Historical Maps

Yijun Lin; Yao-Yi Chiang

arXiv:2506.15010·cs.CV·June 19, 2025

Hyper-Local Deformable Transformers for Text Spotting on Historical Maps

Yijun Lin, Yao-Yi Chiang

PDF

Open Access

TL;DR

This paper introduces PALETTE, a novel end-to-end text spotting method for historical maps that uses hyper-local features and synthetic training data, significantly improving detection and recognition of complex, rotated, and lengthy map texts.

Contribution

The paper presents PALETTE with a hyper-local sampling module and positional embeddings, along with SynthMap+ for synthetic training data, advancing text extraction from diverse and challenging historical maps.

Findings

01

PALETTE outperforms state-of-the-art methods on benchmark datasets.

02

SynthMap+ effectively trains models for complex map texts.

03

Deployed on 60,000 maps, generating 100 million text labels.

Abstract

Text on historical maps contains valuable information providing georeferenced historical, political, and cultural contexts. However, text extraction from historical maps is challenging due to the lack of (1) effective methods and (2) training data. Previous approaches use ad-hoc steps tailored to only specific map styles. Recent machine learning-based text spotters (e.g., for scene images) have the potential to solve these challenges because of their flexibility in supporting various types of text instances. However, these methods remain challenges in extracting precise image features for predicting every sub-component (boundary points and characters) in a text instance. This is critical because map text can be lengthy and highly rotated with complex backgrounds, posing difficulties in detecting relevant image features from a rough text region. This paper proposes PALETTE, an end-to-end…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Processing and 3D Reconstruction · Handwritten Text Recognition Techniques · Image Retrieval and Classification Techniques