Open-Source Web Service with Morphological Dictionary-Supplemented Deep Learning for Morphosyntactic Analysis of Czech
Milan Straka, Jana Strakov\'a

TL;DR
This paper introduces an open-source web service for Czech morphosyntactic analysis that combines deep learning with a morphological dictionary, significantly improving accuracy over existing tools.
Contribution
The novel hybrid approach integrates deep learning with a morphological dictionary at inference time, enhancing Czech morphosyntactic analysis accuracy.
Findings
50% error reduction in lemmatization
58% error reduction in POS tagging
Improved disambiguation and out-of-vocabulary handling
Abstract
We present an open-source web service for Czech morphosyntactic analysis. The system combines a deep learning model with rescoring by a high-precision morphological dictionary at inference time. We show that our hybrid method surpasses two competitive baselines: While the deep learning model ensures generalization for out-of-vocabulary words and better disambiguation, an improvement over an existing morphological analyser MorphoDiTa, at the same time, the deep learning model benefits from inference-time guidance of a manually curated morphological dictionary. We achieve 50% error reduction in lemmatization and 58% error reduction in POS tagging over MorphoDiTa, while also offering dependency parsing. The model is trained on one of the currently largest Czech morphosyntactic corpora, the PDT-C 1.0, with the trained models available at https://hdl.handle.net/11234/1-5293. We provide the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · linguistics and terminology studies · Authorship Attribution and Profiling
Methodstravel james
