ptt5-v2: A Closer Look at Continued Pretraining of T5 Models for the   Portuguese Language

Marcos Piau; Roberto Lotufo; Rodrigo Nogueira

arXiv:2406.10806·cs.CL·November 19, 2024

ptt5-v2: A Closer Look at Continued Pretraining of T5 Models for the Portuguese Language

Marcos Piau, Roberto Lotufo, Rodrigo Nogueira

PDF

Open Access 8 Models

TL;DR

This paper introduces ptt5-v2, a Portuguese-specific T5 model, analyzing how different pretraining configurations affect downstream task performance, and achieves state-of-the-art results on several Portuguese NLP benchmarks.

Contribution

It develops and evaluates a Portuguese T5 model with various pretraining settings, providing insights into their effects and releasing pretrained checkpoints and rerankers.

Findings

01

Pretraining configuration impacts are subtle compared to baseline.

02

State-of-the-art results achieved on Portuguese downstream tasks.

03

Pretrained checkpoints and rerankers are publicly released.

Abstract

Despite advancements in Natural Language Processing (NLP) and the growing availability of pretrained models, the English language remains the primary focus of model development. Continued pretraining on language-specific corpora provides a practical solution for adapting models to other languages. However, the impact of different pretraining settings on downstream tasks remains underexplored. This work introduces $ptt5-v2$ , investigating the continued pretraining of T5 models for Portuguese. We first develop a baseline set of settings and pretrain models with sizes up to 3B parameters. Finetuning on three Portuguese downstream tasks (assin2 STS, assin2 RTE, and TweetSentBR) yields SOTA results on the latter two. We then explore the effects of different pretraining configurations, including pretraining data quality, optimization strategies, and multi-epoch pretraining. Perhaps…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech and dialogue systems · Speech Recognition and Synthesis

MethodsGated Linear Unit · Refunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Sparse Evolutionary Training · Layer Normalization · Byte Pair Encoding · Attention Dropout · Dropout · SentencePiece · Linear Layer