From Traditional Taggers to LLMs: A Comparative Study of POS Tagging for Medieval Romance Languages

Matthias Sch\"offel; Esteban Garces Arias

arXiv:2605.09147·cs.CL·May 12, 2026

From Traditional Taggers to LLMs: A Comparative Study of POS Tagging for Medieval Romance Languages

Matthias Sch\"offel, Esteban Garces Arias

PDF

1 Repo

TL;DR

This study evaluates large language models for POS tagging in medieval Romance languages, showing they outperform traditional methods, especially with transfer learning, and provides practical guidance for digital humanities applications.

Contribution

It offers a systematic empirical comparison of LLMs and traditional taggers for medieval languages, highlighting transfer learning strategies and releasing resources for reproducibility.

Findings

01

LLMs outperform traditional POS taggers across medieval languages.

02

Fine-tuning and multilingual training significantly improve LLM performance.

03

Cross-lingual transfer learning benefits under-resourced medieval language varieties.

Abstract

Part-of-speech (POS) tagging for Medieval Romance languages remains challenging due to orthographic variation, morphological complexity, and limited annotated resources. This paper presents a systematic empirical evaluation of large language models (LLMs) for POS tagging across three medieval varieties: Medieval Occitan, Medieval Catalan, and Medieval French. We compare traditional rule-based and statistical taggers with modern open-source LLMs under zero-shot prompting, few-shot prompting, monolingual fine-tuning, and cross-lingual transfer learning settings. Experiments on historically grounded datasets show that LLM-based approaches consistently outperform traditional taggers, with fine-tuning and multilingual training yielding the largest improvements. In particular, cross-lingual transfer learning substantially benefits under-resourced varieties, while targeted bilingual training…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

null
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.