PORTULAN ExtraGLUE Datasets and Models: Kick-starting a Benchmark for the Neural Processing of Portuguese
Tom\'as Os\'orio, Bernardo Leite, Henrique Lopes Cardoso, Lu\'is, Gomes, Jo\~ao Rodrigues, Rodrigo Santos, Ant\'onio Branco

TL;DR
This paper introduces PORTULAN ExtraGLUE, a comprehensive benchmark with datasets and models for Portuguese language processing, aiming to facilitate future research and development in neural NLP for Portuguese.
Contribution
It provides the first Portuguese-specific benchmark and baseline models, created by translating English datasets and fine-tuning neural models with low-rank adaptation.
Findings
Datasets cover multiple language tasks for Portuguese.
Fine-tuned models serve as baselines for future research.
Resources are available for both European and Brazilian Portuguese.
Abstract
Leveraging research on the neural modelling of Portuguese, we contribute a collection of datasets for an array of language processing tasks and a corresponding collection of fine-tuned neural language models on these downstream tasks. To align with mainstream benchmarks in the literature, originally developed in English, and to kick start their Portuguese counterparts, the datasets were machine-translated from English with a state-of-the-art translation engine. The resulting PORTULAN ExtraGLUE benchmark is a basis for research on Portuguese whose improvement can be pursued in future work. Similarly, the respective fine-tuned neural language models, developed with a low-rank adaptation approach, are made available as baselines that can stimulate future work on the neural processing of Portuguese. All datasets and models have been developed and are made available for two variants of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Text Readability and Simplification · Topic Modeling
MethodsALIGN
