Czech Text Processing with Contextual Embeddings: POS Tagging, Lemmatization, Parsing and NER
Milan Straka, Jana Strakov\'a, Jan Haji\v{c}

TL;DR
This paper evaluates the effectiveness of contextual embeddings, specifically BERT and Flair, on Czech language processing tasks, achieving state-of-the-art results in POS tagging, lemmatization, parsing, and NER.
Contribution
It introduces a comprehensive evaluation of BERT and Flair embeddings on multiple Czech NLP tasks, demonstrating their superior performance over previous methods.
Findings
State-of-the-art results achieved for all tasks
BERT and Flair outperform previous models
Effective for POS, lemmatization, parsing, and NER
Abstract
Contextualized embeddings, which capture appropriate word meaning depending on context, have recently been proposed. We evaluate two meth ods for precomputing such embeddings, BERT and Flair, on four Czech text processing tasks: part-of-speech (POS) tagging, lemmatization, dependency pars ing and named entity recognition (NER). The first three tasks, POS tagging, lemmatization and dependency parsing, are evaluated on two corpora: the Prague Dependency Treebank 3.5 and the Universal Dependencies 2.3. The named entity recognition (NER) is evaluated on the Czech Named Entity Corpus 1.1 and 2.0. We report state-of-the-art results for the above mentioned tasks and corpora.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies
MethodsLinear Layer · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Adam · WordPiece · Softmax
