Design and implementation of an open source Greek POS Tagger and Entity Recognizer using spaCy
Eleni Partalidou, Eleftherios Spyromitros-Xioufis, Stavros Doropoulos,, Stavros Vologiannidis, Konstantinos I. Diamantaras

TL;DR
This paper introduces an open-source Greek POS tagger and entity recognizer built with spaCy, achieving high accuracy and extending existing models for Greek language processing.
Contribution
It adds Greek language support to spaCy, develops a morphological POS tagger, and extends NER models, providing new tools for Greek NLP tasks.
Findings
The Greek POS tagger outperforms state-of-the-art results.
The NER model extends standard ENAMEX types for Greek.
Flexibility in handling out-of-vocabulary words is necessary.
Abstract
This paper proposes a machine learning approach to part-of-speech tagging and named entity recognition for Greek, focusing on the extraction of morphological features and classification of tokens into a small set of classes for named entities. The architecture model that was used is introduced. The greek version of the spaCy platform was added into the source code, a feature that did not exist before our contribution, and was used for building the models. Additionally, a part of speech tagger was trained that can detect the morphology of the tokens and performs higher than the state-of-the-art results when classifying only the part of speech. For named entity recognition using spaCy, a model that extends the standard ENAMEX type (organization, location, person) was built. Certain experiments that were conducted indicate the need for flexibility in out-of-vocabulary words and there is an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
