ptt5-v2: A Closer Look at Continued Pretraining of T5 Models for the Portuguese Language
Marcos Piau, Roberto Lotufo, Rodrigo Nogueira

TL;DR
This paper introduces ptt5-v2, a Portuguese-specific T5 model, analyzing how different pretraining configurations affect downstream task performance, and achieves state-of-the-art results on several Portuguese NLP benchmarks.
Contribution
It develops and evaluates a Portuguese T5 model with various pretraining settings, providing insights into their effects and releasing pretrained checkpoints and rerankers.
Findings
Pretraining configuration impacts are subtle compared to baseline.
State-of-the-art results achieved on Portuguese downstream tasks.
Pretrained checkpoints and rerankers are publicly released.
Abstract
Despite advancements in Natural Language Processing (NLP) and the growing availability of pretrained models, the English language remains the primary focus of model development. Continued pretraining on language-specific corpora provides a practical solution for adapting models to other languages. However, the impact of different pretraining settings on downstream tasks remains underexplored. This work introduces , investigating the continued pretraining of T5 models for Portuguese. We first develop a baseline set of settings and pretrain models with sizes up to 3B parameters. Finetuning on three Portuguese downstream tasks (assin2 STS, assin2 RTE, and TweetSentBR) yields SOTA results on the latter two. We then explore the effects of different pretraining configurations, including pretraining data quality, optimization strategies, and multi-epoch pretraining. Perhaps…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗unicamp-dl/ptt5-v2-smallmodel· 8 dl· ♡ 28 dl♡ 2
- 🤗unicamp-dl/ptt5-v2-basemodel· 427 dl· ♡ 2427 dl♡ 2
- 🤗unicamp-dl/ptt5-v2-largemodel· 13 dl· ♡ 113 dl♡ 1
- 🤗unicamp-dl/ptt5-v2-3bmodel· 1 dl1 dl
- 🤗unicamp-dl/monoptt5-smallmodel· 39 dl· ♡ 139 dl♡ 1
- 🤗unicamp-dl/monoptt5-basemodel· 38 dl38 dl
- 🤗unicamp-dl/monoptt5-largemodel· 1 dl· ♡ 11 dl♡ 1
- 🤗unicamp-dl/monoptt5-3bmodel
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech and dialogue systems · Speech Recognition and Synthesis
MethodsGated Linear Unit · Refunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Sparse Evolutionary Training · Layer Normalization · Byte Pair Encoding · Attention Dropout · Dropout · SentencePiece · Linear Layer
