Decrypting Cryptic Crosswords: Semantically Complex Wordplay Puzzles as a Target for NLP
Josh Rozner, Christopher Potts, Kyle Mahowald

TL;DR
This paper introduces a new cryptic crossword dataset as a challenging benchmark for NLP, evaluates existing models, and proposes a curriculum fine-tuning approach to improve their understanding of complex, creative language puzzles.
Contribution
It provides a novel dataset and benchmark for cryptic crossword clues, and develops a curriculum-based fine-tuning method to enhance NLP models' ability to solve complex, compositional language puzzles.
Findings
Neural models underperform on cryptic clues
Curriculum fine-tuning improves model performance
Models show partial human-like solving strategies
Abstract
Cryptic crosswords, the dominant crossword variety in the UK, are a promising target for advancing NLP systems that seek to process semantically complex, highly compositional language. Cryptic clues read like fluent natural language but are adversarially composed of two parts: a definition and a wordplay cipher requiring character-level manipulations. Expert humans use creative intelligence to solve cryptics, flexibly combining linguistic, world, and domain knowledge. In this paper, we make two main contributions. First, we present a dataset of cryptic clues as a challenging new benchmark for NLP systems that seek to process compositional language in more creative, human-like ways. After showing that three non-neural approaches and T5, a state-of-the-art neural language model, do not achieve good performance, we make our second main contribution: a novel curriculum approach, in which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Authorship Attribution and Profiling
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Byte Pair Encoding · Softmax · Inverse Square Root Schedule · Adafactor · Dropout · SentencePiece
