Segmenting Numerical Substitution Ciphers
Nada Aldarrab, Jonathan May

TL;DR
This paper introduces automatic segmentation methods for unsegmented numerical substitution ciphers using BPE and language models, achieving low error rates and solving a historically unsolved cipher.
Contribution
It presents the first automated segmentation techniques for unsegmented ciphers and a method for solving non-deterministic ciphers with existing keys.
Findings
Achieved 2% segmentation error on synthetic ciphers
Achieved 27% segmentation error on real homophonic ciphers
Fully solved the historical IA cipher using the proposed methods
Abstract
Deciphering historical substitution ciphers is a challenging problem. Example problems that have been previously studied include detecting cipher type, detecting plaintext language, and acquiring the substitution key for segmented ciphers. However, attacking unsegmented, space-free ciphers is still a challenging task. Segmentation (i.e. finding substitution units) is the first step towards cracking those ciphers. In this work, we propose the first automatic methods to segment those ciphers using Byte Pair Encoding (BPE) and unigram language models. Our methods achieve an average segmentation error of 2\% on 100 randomly-generated monoalphabetic ciphers and 27\% on 3 real homophonic ciphers. We also propose a method for solving non-deterministic ciphers with existing keys using a lattice and a pretrained language model. Our method leads to the full solution of the IA cipher; a real…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsChemical Synthesis and Analysis · Machine Learning in Bioinformatics · Natural Language Processing Techniques
