From Rosetta to Match-Up: A Paired Corpus of Linguistic Puzzles with Human and LLM Benchmarks
Neh Majmudar, Anne Huang, Jinfan Frank Hu, Elena Filatova

TL;DR
This paper introduces a new dataset of paired linguistic puzzles in Rosetta Stone and Match-Up formats, along with an evaluation of human and LLM performance, revealing an all-or-nothing solving pattern.
Contribution
It provides a systematic method to convert puzzles between formats and offers insights into linguistic reasoning of humans and LLMs.
Findings
Both humans and LLMs show all-or-nothing solving patterns.
The dataset enables comparative analysis of puzzle difficulty.
LLMs' performance varies significantly across puzzle formats.
Abstract
In this paper, we examine linguistic puzzles used in high school linguistics competitions, focusing on two common formats: Rosetta Stone and Match-Up. We propose a systematic procedure for converting existing Rosetta Stone puzzles into corresponding Match-Up counterparts. Because linguistic puzzle creation is complex and time-consuming, our method provides an efficient way to accelerate the generation of new puzzles. We evaluate the resulting Rosetta Stone-Match-Up pairs with both human participants and large language models (LLMs). Our results show that both expert human solvers and LLMs display an all-or-nothing pattern on Match-Up puzzles, either solving them completely or failing entirely. This work contributes a new dataset of paired puzzles and provides a detailed evaluation of puzzle difficulty across formats, offering insights into both human and machine linguistic reasoning.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
