Decipherment of Historical Manuscript Images
Xusen Yin, Nada Aldarrab, Be\'ata Megyesi, Kevin Knight

TL;DR
This paper presents unsupervised methods to automatically decipher historical manuscript images, enabling historians to access contents of enciphered documents from the early modern period.
Contribution
It introduces novel unsupervised models for character segmentation, clustering, and decipherment, applied to historical cipher manuscripts, with experiments on multiple cipher types.
Findings
Successful unsupervised decipherment of historical cipher images
Effective character-image clustering for ancient manuscripts
Demonstrated models outperform baseline approaches
Abstract
European libraries and archives are filled with enciphered manuscripts from the early modern period. These include military and diplomatic correspondence, records of secret societies, private letters, and so on. Although they are enciphered with classical cryptographic algorithms, their contents are unavailable to working historians. We therefore attack the problem of automatically converting cipher manuscript images into plaintext. We develop unsupervised models for character segmentation, character-image clustering, and decipherment of cluster sequences. We experiment with both pipelined and joint models, and we give empirical results for multiple ciphers.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Digital Media Forensic Detection · Image Processing and 3D Reconstruction
