Challenging the Abilities of Large Language Models in Italian: a Community Initiative
Malvina Nissim, Danilo Croce, Viviana Patti, Pierpaolo Basile, Giuseppe Attanasio, Elio Musacchio, Matteo Rinaldi, Federico Borazio, Maria Francis, Jacopo Gili, Daniel Scalena, Bego\~na Altuna, Ekhi Azurmendi, Valerio Basile, Luisa Bentivogli, Arianna Bisazza, Marianna Bolognesi

TL;DR
This paper introduces CALAMITA, a comprehensive community-driven benchmark for evaluating large language models in Italian across diverse tasks, emphasizing methodology, community engagement, and continuous updates.
Contribution
It presents a large-scale, collaborative benchmarking framework for Italian LLMs, focusing on methodology, diverse tasks, and sustainable evaluation practices.
Findings
Identified strengths and weaknesses of four open-weight LLMs in Italian.
Highlighted the importance of fine-grained, task-specific metrics.
Demonstrated the benefits of community engagement in benchmarking.
Abstract
The rapid progress of Large Language Models (LLMs) has transformed natural language processing and broadened its impact across research and society. Yet, systematic evaluation of these models, especially for languages beyond English, remains limited. "Challenging the Abilities of LAnguage Models in ITAlian" (CALAMITA) is a large-scale collaborative benchmarking initiative for Italian, coordinated under the Italian Association for Computational Linguistics. Unlike existing efforts that focus on leaderboards, CALAMITA foregrounds methodology: it federates more than 80 contributors from academia, industry, and the public sector to design, document, and evaluate a diverse collection of tasks, covering linguistic competence, commonsense reasoning, factual consistency, fairness, summarization, translation, and code generation. Through this process, we not only assembled a benchmark of over 20…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText Readability and Simplification · Computational and Text Analysis Methods · Topic Modeling
