BERTje: A Dutch BERT Model

Wietse de Vries; Andreas van Cranenburgh; Arianna Bisazza; Tommaso; Caselli; Gertjan van Noord; Malvina Nissim

arXiv:1912.09582·cs.CL·December 23, 2019·215 cites

BERTje: A Dutch BERT Model

Wietse de Vries, Andreas van Cranenburgh, Arianna Bisazza, Tommaso, Caselli, Gertjan van Noord, Malvina Nissim

PDF

Open Access 2 Repos 2 Models

TL;DR

BERTje is a Dutch-specific BERT model trained on a large diverse dataset, outperforming multilingual BERT on multiple NLP tasks and publicly available for research use.

Contribution

This paper introduces BERTje, a monolingual Dutch BERT model trained on 2.4 billion tokens, demonstrating improved performance over multilingual BERT on various NLP tasks.

Findings

01

BERTje outperforms multilingual BERT on Dutch NLP tasks

02

BERTje is trained on a larger, more diverse dataset

03

The model is publicly available for research use

Abstract

The transformer-based pre-trained language model BERT has helped to improve state-of-the-art performance on many natural language processing (NLP) tasks. Using the same architecture and parameters, we developed and evaluated a monolingual Dutch BERT model called BERTje. Compared to the multilingual BERT model, which includes Dutch but is only based on Wikipedia text, BERTje is based on a large and diverse dataset of 2.4 billion tokens. BERTje consistently outperforms the equally-sized multilingual BERT model on downstream NLP tasks (part-of-speech tagging, named-entity recognition, semantic role labeling, and sentiment analysis). Our pre-trained Dutch BERT model is made available at https://github.com/wietsedv/bertje.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification

MethodsLinear Layer · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Adam · WordPiece · Softmax