Can Sequence-to-Sequence Models Crack Substitution Ciphers?

Nada Aldarrab; Jonathan May

arXiv:2012.15229·cs.CL·June 3, 2021

Can Sequence-to-Sequence Models Crack Substitution Ciphers?

Nada Aldarrab, Jonathan May

PDF

Open Access

TL;DR

This paper introduces an end-to-end multilingual sequence-to-sequence model capable of deciphering simple substitution ciphers from both synthetic and real historical data, without needing prior language identification.

Contribution

The paper presents a novel end-to-end neural model that can decipher substitution ciphers across multiple languages without explicit language detection, handling noisy and historical texts.

Findings

01

Effective on synthetic and real ciphers

02

No need for language identification

03

Robust to noise in ciphertexts

Abstract

Decipherment of historical ciphers is a challenging problem. The language of the target plaintext might be unknown, and ciphertext can have a lot of noise. State-of-the-art decipherment methods use beam search and a neural language model to score candidate plaintext hypotheses for a given cipher, assuming the plaintext language is known. We propose an end-to-end multilingual model for solving simple substitution ciphers. We test our model on synthetic and real historical ciphers and show that our proposed method can decipher text without explicit language identification while still being robust to noise.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Authorship Attribution and Profiling