eFontes. Part of Speech Tagging and Lemmatization of Medieval Latin   Texts.A Cross-Genre Survey

Krzysztof Nowak; J\k{e}drzej Zi\k{e}bura; Krzysztof Wr\'obel,; Aleksander Smywi\'nski-Pohl

arXiv:2407.00418·cs.CL·July 2, 2024

eFontes. Part of Speech Tagging and Lemmatization of Medieval Latin Texts.A Cross-Genre Survey

Krzysztof Nowak, J\k{e}drzej Zi\k{e}bura, Krzysztof Wr\'obel,, Aleksander Smywi\'nski-Pohl

PDF

Open Access

TL;DR

This paper presents the development and evaluation of eFontes models for automatic linguistic annotation of Medieval Latin texts, achieving high accuracy in lemmatization, POS tagging, and morphological analysis using Transformer-based models trained on specialized corpora.

Contribution

It introduces novel Transformer-based models trained on the eFontes corpus for Medieval Latin, addressing orthographic and vernacular challenges in linguistic annotation.

Findings

01

Lemmatization accuracy: 92.60%

02

POS tagging accuracy: 83.29%

03

Morphological features accuracy: 88.57%

Abstract

This study introduces the eFontes models for automatic linguistic annotation of Medieval Latin texts, focusing on lemmatization, part-of-speech tagging, and morphological feature determination. Using the Transformers library, these models were trained on Universal Dependencies (UD) corpora and the newly developed eFontes corpus of Polish Medieval Latin. The research evaluates the models' performance, addressing challenges such as orthographic variations and the integration of Latinized vernacular terms. The models achieved high accuracy rates: lemmatization at 92.60%, part-of-speech tagging at 83.29%, and morphological feature determination at 88.57%. The findings underscore the importance of high-quality annotated corpora and propose future enhancements, including extending the models to Named Entity Recognition.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTranslation Studies and Practices · Lexicography and Language Studies · Natural Language Processing Techniques