Latin Treebanks in Review: An Evaluation of Morphological Tagging Across Time
Marisa Hudspeth, Brendan O'Connor, Laure Thompson

TL;DR
This paper reviews Latin treebanks spanning 17 centuries, analyzes their annotation heterogeneity, and evaluates morphological tagging performance across time using BERT-based models, highlighting their robustness and improvements.
Contribution
It systematically reviews Latin treebanks, creates standardized annotation conversions, and introduces new time-based data splits for cross-temporal morphological tagging evaluation.
Findings
BERT-based taggers outperform previous models.
Taggers show robustness across different historical periods.
The study provides a comprehensive analysis of Latin morphological annotation coverage.
Abstract
Existing Latin treebanks draw from Latin's long written tradition, spanning 17 centuries and a variety of cultures. Recent efforts have begun to harmonize these treebanks' annotations to better train and evaluate morphological taggers. However, the heterogeneity of these treebanks must be carefully considered to build effective and reliable data. In this work, we review existing Latin treebanks to identify the texts they draw from, identify their overlap, and document their coverage across time and genre. We additionally design automated conversions of their morphological feature annotations into the conventions of standard Latin grammar. From this, we build new time-period data splits that draw from the existing treebanks which we use to perform a broad cross-time analysis for POS and morphological feature tagging. We find that BERT-based taggers outperform existing taggers while also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques · linguistics and terminology studies · Lexicography and Language Studies
