Unveiling Factors for Enhanced POS Tagging: A Study of Low-Resource Medieval Romance Languages

Matthias Sch\"offel; Esteban Garces Arias; Marinus Wiedner; Paula Ruppert; Meimingwei Li; Christian Heumann; Matthias A{\ss}enmacher

arXiv:2506.17715·cs.CL·June 24, 2025

Unveiling Factors for Enhanced POS Tagging: A Study of Low-Resource Medieval Romance Languages

Matthias Sch\"offel, Esteban Garces Arias, Marinus Wiedner, Paula Ruppert, Meimingwei Li, Christian Heumann, Matthias A{\ss}enmacher

PDF

TL;DR

This paper investigates the challenges and solutions for POS tagging in low-resource Medieval Romance languages, highlighting the impact of linguistic evolution and spelling variations on modern language models.

Contribution

It systematically evaluates various techniques like fine-tuning, prompt engineering, and transfer learning to improve POS tagging in historical languages.

Findings

01

LLMs face limitations with historical language variations

02

Specialized techniques improve tagging accuracy

03

Cross-lingual transfer shows promise for low-resource languages

Abstract

Part-of-speech (POS) tagging remains a foundational component in natural language processing pipelines, particularly critical for historical text analysis at the intersection of computational linguistics and digital humanities. Despite significant advancements in modern large language models (LLMs) for ancient languages, their application to Medieval Romance languages presents distinctive challenges stemming from diachronic linguistic evolution, spelling variations, and labeled data scarcity. This study systematically investigates the central determinants of POS tagging performance across diverse corpora of Medieval Occitan, Medieval Spanish, and Medieval French texts, spanning biblical, hagiographical, medical, and dietary domains. Through rigorous experimentation, we evaluate how fine-tuning approaches, prompt engineering, model architectures, decoding strategies, and cross-lingual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.