Advancing Neural Encoding of Portuguese with Transformer Albertina PT-*
Jo\~ao Rodrigues, Lu\'is Gomes, Jo\~ao Silva, Ant\'onio Branco,, Rodrigo Santos, Henrique Lopes Cardoso, Tom\'as Os\'orio

TL;DR
This paper introduces Albertina PT-*, a Transformer-based model that significantly improves neural encoding for European and American Portuguese, supporting Portuguese language technology development.
Contribution
The paper presents a new state-of-the-art Transformer model for Portuguese, trained on diverse datasets, and freely available for research and innovation.
Findings
Albertina PT-PT and PT-BR outperform previous models on language tasks.
Models are accessible on consumer hardware, promoting wider research.
Achieved new benchmarks in Portuguese language processing.
Abstract
To advance the neural encoding of Portuguese (PT), and a fortiori the technological preparation of this language for the digital age, we developed a Transformer-based foundation model that sets a new state of the art in this respect for two of its variants, namely European Portuguese from Portugal (PT-PT) and American Portuguese from Brazil (PT-BR). To develop this encoder, which we named Albertina PT-*, a strong model was used as a starting point, DeBERTa, and its pre-training was done over data sets of Portuguese, namely over data sets we gathered for PT-PT and PT-BR, and over the brWaC corpus for PT-BR. The performance of Albertina and competing models was assessed by evaluating them on prominent downstream language processing tasks adapted for Portuguese. Both Albertina PT-PT and PT-BR versions are distributed free of charge and under the most permissive license possible and can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis
MethodsHow do I file a dispute with Expedia?*DisputeFastService · 7 Fastest Ways to Call American Airlines Reservations Number (USA Guide) · DeBERTa
