Morphological Analyzer and Generator for Russian and Ukrainian Languages
Mikhail Korobov

TL;DR
pymorphy2 is an open-source morphological analyzer and generator for Russian and Ukrainian, utilizing large lexicons and linguistic rules to achieve high-quality analysis and generation, including out-of-vocabulary words.
Contribution
It introduces a Python-based tool with efficient lexicons and rules for morphological analysis and generation, including out-of-vocabulary words, with high accuracy for Russian.
Findings
State-of-the-art analysis quality for Russian
Supports Ukrainian language analysis and generation
Open-source with emphasis on usability and extensibility
Abstract
pymorphy2 is a morphological analyzer and generator for Russian and Ukrainian languages. It uses large efficiently encoded lexi- cons built from OpenCorpora and LanguageTool data. A set of linguistically motivated rules is developed to enable morphological analysis and generation of out-of-vocabulary words observed in real-world documents. For Russian pymorphy2 provides state-of-the-arts morphological analysis quality. The analyzer is implemented in Python programming language with optional C++ extensions. Emphasis is put on ease of use, documentation and extensibility. The package is distributed under a permissive open-source license, encouraging its use in both academic and commercial setting.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Authorship Attribution and Profiling · Mathematics, Computing, and Information Processing
