Are Language Models Borrowing-Blind? A Multilingual Evaluation of Loanword Identification across 10 Languages
M\'erilin Sousa Silva, Sina Ahmadi

TL;DR
This study evaluates whether multilingual language models can identify loanwords across ten languages, revealing that they perform poorly despite contextual cues, highlighting biases and implications for language preservation.
Contribution
It provides a comprehensive multilingual evaluation of pretrained models' ability to identify loanwords, exposing their limitations and biases in this task.
Findings
Models perform poorly in loanword identification.
Models exhibit a bias towards loanwords over native words.
Implications for NLP tools supporting minority languages.
Abstract
Throughout language history, words are borrowed from one language to another and gradually become integrated into the recipient's lexicon. Speakers can often differentiate these loanwords from native vocabulary, particularly in bilingual communities where a dominant language continuously imposes lexical items on a minority language. This paper investigates whether pretrained language models, including large language models, possess similar capabilities for loanword identification. We evaluate multiple models across 10 languages. Despite explicit instructions and contextual information, our results show that models perform poorly in distinguishing loanwords from native ones. These findings corroborate previous evidence that modern NLP systems exhibit a bias toward loanwords rather than native equivalents. Our work has implications for developing NLP tools for minority languages and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
