What Makes Cryptic Crosswords Challenging for LLMs?

Abdelrahman Sadallah; Daria Kotova; Ekaterina Kochmar

arXiv:2412.09012·cs.CL·January 15, 2025

What Makes Cryptic Crosswords Challenging for LLMs?

Abdelrahman Sadallah, Daria Kotova, Ekaterina Kochmar

PDF

Open Access 1 Repo

TL;DR

This paper benchmarks and analyzes the challenges faced by large language models in solving cryptic crosswords, revealing significant performance gaps compared to humans and exploring underlying reasons for their struggles.

Contribution

It establishes benchmark results for popular LLMs on cryptic crosswords and investigates the reasons behind their poor performance.

Findings

01

LLMs perform significantly worse than humans on cryptic crosswords.

02

Benchmark results are provided for Gemma2, LLaMA3, and ChatGPT.

03

The paper offers insights into why LLMs struggle with this task.

Abstract

Cryptic crosswords are puzzles that rely on general knowledge and the solver's ability to manipulate language on different levels, dealing with various types of wordplay. Previous research suggests that solving such puzzles is challenging even for modern NLP models, including Large Language Models (LLMs). However, there is little to no research on the reasons for their poor performance on this task. In this paper, we establish the benchmark results for three popular LLMs: Gemma2, LLaMA3 and ChatGPT, showing that their performance on this task is still significantly below that of humans. We also investigate why these models struggle to achieve superior performance. We release our code and introduced datasets at https://github.com/bodasadallah/decrypting-crosswords.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bodasadallah/decrypting-crosswords
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLibrary Science and Information Systems