A State-of-the-Art Morphosyntactic Parser and Lemmatizer for Ancient Greek
Giuseppe G. A. Celano

TL;DR
This paper compares six models to develop a state-of-the-art morphosyntactic parser and lemmatizer for Ancient Greek, highlighting the importance of specialized modeling strategies for syntactic accuracy.
Contribution
It introduces a comprehensive comparison of models for Ancient Greek parsing and lemmatization, identifying the most effective approaches and providing resources for future research.
Findings
Dithrax and Trankit perform similarly in morphology annotation.
Trankit excels in syntactic annotation.
GreTa provides the best lemmatization results.
Abstract
This paper presents an experiment consisting in the comparison of six models to identify a state-of-the-art morphosyntactic parser and lemmatizer for Ancient Greek capable of annotating according to the Ancient Greek Dependency Treebank annotation scheme. A normalized version of the major collections of annotated texts was used to (i) train the baseline model Dithrax with randomly initialized character embeddings and (ii) fine-tune Trankit and four recent models pretrained on Ancient Greek texts, i.e., GreBERTa and PhilBERTa for morphosyntactic annotation and GreTA and PhilTa for lemmatization. A Bayesian analysis shows that Dithrax and Trankit annotate morphology practically equivalently, while syntax is best annotated by Trankit and lemmata by GreTa. The results of the experiment suggest that token embeddings are not sufficient to achieve high UAS and LAS scores unless they are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech and dialogue systems · Speech Recognition and Synthesis
