Non Verbis, Sed Rebus: Large Language Models are Weak Solvers of Italian   Rebuses

Gabriele Sarti; Tommaso Caselli; Malvina Nissim; Arianna Bisazza

arXiv:2408.00584·cs.CL·August 2, 2024

Non Verbis, Sed Rebus: Large Language Models are Weak Solvers of Italian Rebuses

Gabriele Sarti, Tommaso Caselli, Malvina Nissim, Arianna Bisazza

PDF

Open Access 1 Repo 10 Models 2 Datasets

TL;DR

This paper evaluates large language models' ability to solve Italian rebuses, revealing their limitations and the tendency of fine-tuning to lead to memorization rather than genuine reasoning skills.

Contribution

It introduces a new Italian rebus dataset and systematically assesses LLMs' performance, highlighting their weaknesses and the impact of fine-tuning.

Findings

01

LLMs perform poorly on Italian rebuses

02

Fine-tuning improves performance but mainly causes memorization

03

Rebus solving is a challenging benchmark for linguistic reasoning

Abstract

Rebuses are puzzles requiring constrained multi-step reasoning to identify a hidden phrase from a set of images and letters. In this work, we introduce a large collection of verbalized rebuses for the Italian language and use it to assess the rebus-solving capabilities of state-of-the-art large language models. While general-purpose systems such as LLaMA-3 and GPT-4o perform poorly on this task, ad-hoc fine-tuning seems to improve models' performance. However, we find that performance gains from training are largely motivated by memorization. Our results suggest that rebus solving remains a challenging test bed to evaluate large language models' linguistic proficiency and sequential instruction-following skills.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gsarti/verbalized-rebus
none

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques

MethodsSparse Evolutionary Training