The Catalan Language CLUB

Carlos Rodriguez-Penagos; Carme Armentano-Oller; Marta Villegas; Maite; Melero; Aitor Gonzalez; Ona de Gibert Bonet; and Casimiro Carrino Pio

arXiv:2112.01894·cs.CL·December 6, 2021

The Catalan Language CLUB

Carlos Rodriguez-Penagos, Carme Armentano-Oller, Marta Villegas, Maite, Melero, Aitor Gonzalez, Ona de Gibert Bonet, and Casimiro Carrino Pio

PDF

Open Access

TL;DR

The paper introduces the Catalan Language CLUB, a comprehensive benchmark for evaluating Catalan language understanding across multiple tasks, supporting AI development for Catalan through standardized assessments.

Contribution

It presents the first dedicated Catalan language understanding benchmark, enabling consistent evaluation of models on diverse NLU tasks.

Findings

01

Provides a suite of datasets for Catalan NLU tasks

02

Enables standardized evaluation of Catalan language models

03

Supports AI development for Catalan language

Abstract

The Catalan Language Understanding Benchmark (CLUB) encompasses various datasets representative of different NLU tasks that enable accurate evaluations of language models, following the General Language Understanding Evaluation (GLUE) example. It is part of AINA and PlanTL, two public funding initiatives to empower the Catalan language in the Artificial Intelligence era.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Text Readability and Simplification · Topic Modeling