UDPipe at SIGMORPHON 2019: Contextualized Embeddings, Regularization with Morphological Categories, Corpora Merging
Milan Straka, Jana Strakov\'a, Jan Haji\v{c}

TL;DR
This paper enhances a multilingual morphological analysis system by integrating contextualized embeddings, morphological feature regularization, and corpus merging, achieving state-of-the-art results in the SIGMORPHON 2019 shared task.
Contribution
It introduces modifications to UDPipe 2.0, including BERT embeddings, morphological feature regularization, and corpus merging, to improve crosslingual morphological analysis and lemmatization.
Findings
Lemmatization accuracy of 95.78%, surpassing all competitors.
Morphological analysis accuracy of 93.19%, closely behind the top system.
System performed strongly across multiple languages in the shared task.
Abstract
We present our contribution to the SIGMORPHON 2019 Shared Task: Crosslinguality and Context in Morphology, Task 2: contextual morphological analysis and lemmatization. We submitted a modification of the UDPipe 2.0, one of best-performing systems of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies and an overall winner of the The 2018 Shared Task on Extrinsic Parser Evaluation. As our first improvement, we use the pretrained contextualized embeddings (BERT) as additional inputs to the network; secondly, we use individual morphological features as regularization; and finally, we merge the selected corpora of the same language. In the lemmatization task, our system exceeds all the submitted systems by a wide margin with lemmatization accuracy 95.78 (second best was 95.00, third 94.46). In the morphological analysis, our system placed tightly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
