Challenging the Abilities of Large Language Models in Italian: a Community Initiative

Malvina Nissim; Danilo Croce; Viviana Patti; Pierpaolo Basile; Giuseppe Attanasio; Elio Musacchio; Matteo Rinaldi; Federico Borazio; Maria Francis; Jacopo Gili; Daniel Scalena; Bego\~na Altuna; Ekhi Azurmendi; Valerio Basile; Luisa Bentivogli; Arianna Bisazza; Marianna Bolognesi; Dominique Brunato; Tommaso Caselli; Silvia Casola; Maria Cassese; Mauro Cettolo; Claudia Collacciani; Leonardo De Cosmo; Maria Pia Di Buono; Andrea Esuli; Julen Etxaniz; Chiara Ferrando; Alessia Fidelangeli; Simona Frenda; Achille Fusco; Marco Gaido; Andrea Galassi; Federico Galli; Luca Giordano; Mattia Goffetti; Itziar Gonzalez-Dios; Lorenzo Gregori; Giulia Grundler; Sandro Iannaccone; Chunyang Jiang; Moreno La Quatra; Francesca Lagioia; Soda Marem Lo; Marco Madeddu; Bernardo Magnini; Raffaele Manna; Fabio Mercorio; Paola Merlo; Arianna Muti; Vivi Nastase; Matteo Negri; Dario Onorati; Elena Palmieri; Sara Papi; Lucia Passaro; Giulia Pensa; Andrea Piergentili; Daniele Potert\`i; Giovanni Puccetti; Federico Ranaldi; Leonardo Ranaldi; Andrea Amelio Ravelli; Martina Rosola; Elena Sofia Ruzzetti; Giuseppe Samo; Andrea Santilli; Piera Santin; Gabriele Sarti; Giovanni Sartor; Beatrice Savoldi; Antonio Serino; Andrea Seveso; Lucia Siciliani; Paolo Torroni; Rossella Varvara; Andrea Zaninello; Asya Zanollo; Fabio Massimo Zanzotto; Kamyar Zeinalipour; Andrea Zugarini

arXiv:2512.04759·cs.CL·December 5, 2025

Challenging the Abilities of Large Language Models in Italian: a Community Initiative

Malvina Nissim, Danilo Croce, Viviana Patti, Pierpaolo Basile, Giuseppe Attanasio, Elio Musacchio, Matteo Rinaldi, Federico Borazio, Maria Francis, Jacopo Gili, Daniel Scalena, Bego\~na Altuna, Ekhi Azurmendi, Valerio Basile, Luisa Bentivogli, Arianna Bisazza, Marianna Bolognesi

PDF

Open Access

TL;DR

This paper introduces CALAMITA, a comprehensive community-driven benchmark for evaluating large language models in Italian across diverse tasks, emphasizing methodology, community engagement, and continuous updates.

Contribution

It presents a large-scale, collaborative benchmarking framework for Italian LLMs, focusing on methodology, diverse tasks, and sustainable evaluation practices.

Findings

01

Identified strengths and weaknesses of four open-weight LLMs in Italian.

02

Highlighted the importance of fine-grained, task-specific metrics.

03

Demonstrated the benefits of community engagement in benchmarking.

Abstract

The rapid progress of Large Language Models (LLMs) has transformed natural language processing and broadened its impact across research and society. Yet, systematic evaluation of these models, especially for languages beyond English, remains limited. "Challenging the Abilities of LAnguage Models in ITAlian" (CALAMITA) is a large-scale collaborative benchmarking initiative for Italian, coordinated under the Italian Association for Computational Linguistics. Unlike existing efforts that focus on leaderboards, CALAMITA foregrounds methodology: it federates more than 80 contributors from academia, industry, and the public sector to design, document, and evaluate a diverse collection of tasks, covering linguistic competence, commonsense reasoning, factual consistency, fairness, summarization, translation, and code generation. Through this process, we not only assembled a benchmark of over 20…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText Readability and Simplification · Computational and Text Analysis Methods · Topic Modeling