GottBERT: a pure German Language Model
Raphael Scheible, Fabian Thomczyk, Patric Tippmann, Victor Jaravine, Martin Boeker

TL;DR
GottBERT is a newly introduced German language model based on RoBERTa that outperforms existing German and multilingual models on several NLP tasks, supporting German NLP research.
Contribution
This work presents the first German single-language RoBERTa model, GottBERT, trained on OSCAR data and evaluated on multiple NLP tasks, outperforming existing models.
Findings
GottBERT outperforms all tested German and multilingual models on NER and text classification tasks.
It was successfully pre-trained on a 256-core TPU using RoBERTa BASE architecture.
GottBERT is publicly available under the AGPLv3 license.
Abstract
Lately, pre-trained language models advanced the field of natural language processing (NLP). The introduction of Bidirectional Encoders for Transformers (BERT) and its optimized version RoBERTa have had significant impact and increased the relevance of pre-trained models. First, research in this field mainly started on English data followed by models trained with multilingual text corpora. However, current research shows that multilingual models are inferior to monolingual models. Currently, no German single language RoBERTa model is yet published, which we introduce in this work (GottBERT). The German portion of the OSCAR data set was used as text corpus. In an evaluation we compare its performance on the two Named Entity Recognition (NER) tasks Conll 2003 and GermEval 2014 as well as on the text classification tasks GermEval 2018 (fine and coarse) and GNAD with existing German single…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Biomedical Text Mining and Ontologies
MethodsLinear Layer · OSCAR · WordPiece · Residual Connection · Dense Connections · Attention Is All You Need · Softmax · Refunds@Expedia|||How do I get a full refund from Expedia? · Dropout · RoBERTa
