Latin BERT: A Contextual Language Model for Classical Philology
David Bamman, Patrick J. Burns

TL;DR
Latin BERT is a specialized language model trained on extensive Latin texts, enabling advanced NLP tasks like POS tagging, text completion, and semantic search, thereby enhancing computational approaches in classical philology.
Contribution
This paper introduces Latin BERT, the first contextual language model for Latin, achieving state-of-the-art results and supporting various NLP applications in classical studies.
Findings
Latin BERT achieves new state-of-the-art for Latin POS tagging.
It outperforms static embeddings in word sense disambiguation.
Enables semantically-informed search using contextual nearest neighbors.
Abstract
We present Latin BERT, a contextual language model for the Latin language, trained on 642.7 million words from a variety of sources spanning the Classical era to the 21st century. In a series of case studies, we illustrate the affordances of this language-specific model both for work in natural language processing for Latin and in using computational methods for traditional scholarship: we show that Latin BERT achieves a new state of the art for part-of-speech tagging on all three Universal Dependency datasets for Latin and can be used for predicting missing text (including critical emendations); we create a new dataset for assessing word sense disambiguation for Latin and demonstrate that Latin BERT outperforms static word embeddings; and we show that it can be used for semantically-informed search by querying contextual nearest neighbors. We publicly release trained models to help…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Digital Humanities and Scholarship
MethodsLinear Layer · Adam · Softmax · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Weight Decay · Dropout · Linear Warmup With Linear Decay · Attention Dropout · Layer Normalization
