Lost in Translation? Exploring the Shift in Grammatical Gender from Latin to Occitan

Ahan Chatterjee; Matthias Sch\"offel; Matthias A{\ss}enmacher; Esteban Garces Arias

arXiv:2605.09156·cs.CL·May 12, 2026

Lost in Translation? Exploring the Shift in Grammatical Gender from Latin to Occitan

Ahan Chatterjee, Matthias Sch\"offel, Matthias A{\ss}enmacher, Esteban Garces Arias

PDF

1 Repo

TL;DR

This paper introduces an interpretable deep learning framework to analyze the historical shift in grammatical gender from Latin to Occitan, focusing on lexical and contextual factors.

Contribution

It presents a novel tokenizer and analytical methods to study gender prediction, revealing how morphological and contextual features influence gender evolution.

Findings

01

Tokenizer improves gender prediction performance in low-resource settings.

02

Morphological features significantly contribute to lexical gender prediction.

03

Part-of-speech categories influence contextual gender prediction.

Abstract

The diachronic evolution from Latin to the Romance languages involved a restructuring of the grammatical gender system from a tripartite configuration (masculine, feminine, neuter) to a bipartite one (masculine, feminine). In this work, we introduce an interpretable deep learning framework to investigate this phenomenon at both lexical and contextual levels. First, we show that conventional tokenization strategies are insufficiently robust for this low-resource historical setting, and that our proposed tokenizer improves performance over these baselines. At the lexical level, we evaluate the contribution of morphological features to gender prediction. At the contextual level, we quantify the contributions of different part-of-speech categories to grammatical gender prediction. Together, these analyses characterize the distribution of gender information between the lemma and its…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

null
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.