Corpus and Models for Lemmatisation and POS-tagging of Old French
Jean-Baptiste Camps, Thibault Cl\'erice, Fr\'ed\'eric Duval, Lucence, Ing, Naomi Kanaoka, Ariane Pinche

TL;DR
This paper presents a neural approach to lemmatization and POS tagging for Old French, leveraging dedicated corpora to address the language's variation and resource scarcity.
Contribution
It introduces new neural models and a corpus construction methodology specifically tailored for Old French linguistic analysis.
Findings
Successful development of lemmatization and POS-tagging models for Old French
Demonstrated the effectiveness of neural taggers on historic language data
Progress in creating dedicated corpora for under-resourced languages
Abstract
Old French is a typical example of an under-resourced historic languages, that furtherly displays animportant amount of linguistic variation. In this paper, we present the current results of a long going project (2015-...) and describe how we broached the difficult question of providing lemmatisation andPOS models for Old French with the help of neural taggers and the progressive constitution of dedicated corpora.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Lexicography and Language Studies · linguistics and terminology studies
