Solving Historical Dictionary Codes with a Neural Language Model
Christopher Chu, Raphael Valenti, Kevin Knight

TL;DR
This paper introduces a neural language model-based method to decode historical substitution ciphers, successfully deciphering over 75% of tokens in 18th-century US-Spanish correspondence.
Contribution
It presents a novel decoding approach combining a lattice search with neural language models for complex historical ciphers.
Findings
Deciphered 75.1% of cipher tokens correctly
Applied method to 18th-century US-Spanish correspondence
Demonstrated effectiveness of neural models in cryptanalysis
Abstract
We solve difficult word-based substitution codes by constructing a decoding lattice and searching that lattice with a neural language model. We apply our method to a set of enciphered letters exchanged between US Army General James Wilkinson and agents of the Spanish Crown in the late 1700s and early 1800s, obtained from the US Library of Congress. We are able to decipher 75.1% of the cipher-word tokens correctly.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Handwritten Text Recognition Techniques · Authorship Attribution and Profiling
