TL;DR
RobeCzech is a Czech-specific RoBERTa model that significantly outperforms multilingual and previous Czech models across multiple NLP tasks, setting new state-of-the-art benchmarks.
Contribution
This paper introduces RobeCzech, a monolingual Czech language model based on RoBERTa, achieving superior performance over existing models in various NLP tasks.
Findings
Outperforms multilingual and Czech-trained models
Achieves state-of-the-art in five NLP tasks
Reaches top results in four tasks
Abstract
We present RobeCzech, a monolingual RoBERTa language representation model trained on Czech data. RoBERTa is a robustly optimized Transformer-based pretraining approach. We show that RobeCzech considerably outperforms equally-sized multilingual and Czech-trained contextualized language representation models, surpasses current state of the art in all five evaluated NLP tasks and reaches state-of-the-art results in four of them. The RobeCzech model is released publicly at https://hdl.handle.net/11234/1-3691 and https://huggingface.co/ufal/robeczech-base.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsMulti-Head Attention · Linear Layer · Linear Warmup With Linear Decay · WordPiece · Layer Normalization · Attention Dropout · Softmax · Refunds@Expedia|||How do I get a full refund from Expedia? · Dropout · Attention Is All You Need
