Non Verbis, Sed Rebus: Large Language Models are Weak Solvers of Italian Rebuses
Gabriele Sarti, Tommaso Caselli, Malvina Nissim, Arianna Bisazza

TL;DR
This paper evaluates large language models' ability to solve Italian rebuses, revealing their limitations and the tendency of fine-tuning to lead to memorization rather than genuine reasoning skills.
Contribution
It introduces a new Italian rebus dataset and systematically assesses LLMs' performance, highlighting their weaknesses and the impact of fine-tuning.
Findings
LLMs perform poorly on Italian rebuses
Fine-tuning improves performance but mainly causes memorization
Rebus solving is a challenging benchmark for linguistic reasoning
Abstract
Rebuses are puzzles requiring constrained multi-step reasoning to identify a hidden phrase from a set of images and letters. In this work, we introduce a large collection of verbalized rebuses for the Italian language and use it to assess the rebus-solving capabilities of state-of-the-art large language models. While general-purpose systems such as LLaMA-3 and GPT-4o perform poorly on this task, ad-hoc fine-tuning seems to improve models' performance. However, we find that performance gains from training are largely motivated by memorization. Our results suggest that rebus solving remains a challenging test bed to evaluate large language models' linguistic proficiency and sequential instruction-following skills.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗gsarti/phi3-mini-rebus-solver-adaptersmodel
- 🤗gsarti/llama-3.1-8b-rebus-solver-adaptersmodel
- 🤗gsarti/llama-3.1-8b-rebus-solver-fp16model· 1 dl1 dl
- 🤗gsarti/llama-3.1-8b-rebus-solver-Q8_0-GGUFmodel· 15 dl15 dl
- 🤗gsarti/phi3-mini-rebus-solver-fp16model· 3 dl3 dl
- 🤗gsarti/phi3-mini-rebus-solver-Q8_0-GGUFmodel· 39 dl39 dl
- 🤗gsarti/gemma-2-2b-rebus-solver-adaptersmodel
- 🤗gsarti/gemma-2-2b-rebus-solver-fp16model· 9 dl9 dl
- 🤗gsarti/gemma-2-2b-rebus-solver-Q8_0-GGUFmodel· 20 dl20 dl
- 🤗RichardErkhov/gsarti_-_gemma-2-2b-rebus-solver-fp16-awqmodel· 2 dl2 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
MethodsSparse Evolutionary Training
