LatinCy: Synthetic Trained Pipelines for Latin NLP

Patrick J. Burns (Institute for the Study of the Ancient World/New; York University)

arXiv:2305.04365·cs.CL·May 9, 2023·2 cites

LatinCy: Synthetic Trained Pipelines for Latin NLP

Patrick J. Burns (Institute for the Study of the Ancient World/New, York University)

PDF

Open Access 3 Models

TL;DR

LatinCy provides a set of high-performance, trained NLP pipelines for Latin, enabling researchers to perform various language processing tasks efficiently using spaCy.

Contribution

It introduces LatinCy, the first comprehensive set of trained Latin NLP models integrated with spaCy, trained on extensive Latin datasets including Universal Dependency treebanks.

Findings

01

POS tagging accuracy of 97.41%

02

Lemmatization accuracy of 94.66%

03

Morphological tagging accuracy of 92.76%

Abstract

This paper introduces LatinCy, a set of trained general purpose Latin-language "core" pipelines for use with the spaCy natural language processing framework. The models are trained on a large amount of available Latin data, including all five of the Latin Universal Dependency treebanks, which have been preprocessed to be compatible with each other. The result is a set of general models for Latin with good performance on a number of natural language processing tasks (e.g. the top-performing model yields POS tagging, 97.41% accuracy; lemmatization, 94.66% accuracy; morphological tagging 92.76% accuracy). The paper describes the model training, including its training data and parameterization, and presents the advantages to Latin-language researchers of having a spaCy model available for NLP work.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Lexicography and Language Studies · Translation Studies and Practices