LuxBorrow: From Pompier to Pompjee, Tracing Borrowing in Luxembourgish
Nina Hosseini-Kivanani, Fred Philippy

TL;DR
This study analyzes 27 years of Luxembourgish news to understand borrowing patterns, revealing persistent multilingual practices, dominant French influence, and evolving morphosyntactic adaptations through a comprehensive computational pipeline.
Contribution
It introduces LuxBorrow, a novel pipeline combining language identification and borrowing resolution to analyze lexical borrowing and code-switching in Luxembourgish news over time.
Findings
LU remains the matrix language across all documents.
Multilingual practice is pervasive with most articles containing multiple languages.
Morphological and orthographic adaptations are the primary forms of borrowing.
Abstract
We present LuxBorrow, a borrowing-first analysis of Luxembourgish (LU) news spanning 27 years (1999-2025), covering 259,305 RTL articles and 43.7M tokens. Our pipeline combines sentence-level language identification (LU/DE/FR/EN) with a token-level borrowing resolver restricted to LU sentences, using lemmatization, a collected loanword registry, and compiled morphological and orthographic rules. Empirically, LU remains the matrix language across all documents, while multilingual practice is pervasive: 77.1% of articles include at least one donor language and 65.4% use three or four. Breadth does not imply intensity: median code-mixing index (CMI) increases from 3.90 (LU+1) to only 7.00 (LU+3), indicating localized insertions rather than balanced bilingual text. Domain and period summaries show moderate but persistent mixing, with CMI rising from 6.1 (1999-2007) to a peak of 8.4 in 2020.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLinguistics, Language Diversity, and Identity · Linguistic research and analysis · Multilingual Education and Policy
