Pre-training Polish Transformer-based Language Models at Scale

S{\l}awomir Dadas; Micha{\l} Pere{\l}kiewicz; Rafa{\l} Po\'swiata

arXiv:2006.04229·cs.CL·June 11, 2020

Pre-training Polish Transformer-based Language Models at Scale

S{\l}awomir Dadas, Micha{\l} Pere{\l}kiewicz, Rafa{\l} Po\'swiata

PDF

1 Repo 4 Models

TL;DR

This paper introduces large-scale Polish transformer-based language models trained on extensive datasets, demonstrating significant improvements across multiple NLP tasks compared to previous models.

Contribution

The authors present two new Polish BERT-based models trained on over 1 billion sentences, with detailed methodology and evaluation showing superior performance.

Findings

01

Models outperform previous approaches on 11 out of 13 tasks

02

Training on large-scale data improves NLP performance for Polish

03

Methodology for data collection and pre-training is detailed

Abstract

Transformer-based language models are now widely used in Natural Language Processing (NLP). This statement is especially true for English language, in which many pre-trained models utilizing transformer-based architecture have been published in recent years. This has driven forward the state of the art for a variety of standard NLP tasks such as classification, regression, and sequence labeling, as well as text-to-text tasks, such as machine translation, question answering, or summarization. The situation have been different for low-resource languages, such as Polish, however. Although some transformer-based language models for Polish are available, none of them have come close to the scale, in terms of corpus size and the number of parameters, of the largest English-language models. In this study, we present two language models for Polish based on the popular BERT architecture. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sdadas/polish-roberta
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Layer · Weight Decay · Softmax · Adam · Multi-Head Attention · Dropout · Refunds@Expedia|||How do I get a full refund from Expedia? · Attention Dropout · Linear Warmup With Linear Decay · Dense Connections