A Computational Approach to Language Contact -- A Case Study of Persian

Ali Basirat; Danial Namazifard; Navid Baradaran Hemmati

arXiv:2601.20592·cs.CL·January 29, 2026

A Computational Approach to Language Contact -- A Case Study of Persian

Ali Basirat, Danial Namazifard, Navid Baradaran Hemmati

PDF

Open Access 1 Video

TL;DR

This study examines how a Persian-trained language model encodes contact-induced linguistic features, revealing that universal syntax remains stable while morphology reflects contact effects, highlighting structural constraints in language modeling.

Contribution

It introduces a methodology to quantify and analyze the influence of language contact on monolingual language model representations, focusing on Persian.

Findings

01

Universal syntactic information is unaffected by language contact.

02

Morphological features like Case and Gender are influenced by contact.

03

Contact effects are structurally constrained in model representations.

Abstract

We investigate structural traces of language contact in the intermediate representations of a monolingual language model. Focusing on Persian (Farsi) as a historically contact-rich language, we probe the representations of a Persian-trained model when exposed to languages with varying degrees and types of contact with Persian. Our methodology quantifies the amount of linguistic information encoded in intermediate representations and assesses how this information is distributed across model components for different morphosyntactic features. The results show that universal syntactic information is largely insensitive to historical contact, whereas morphological features such as Case and Gender are strongly shaped by language-specific structure, suggesting that contact effects in monolingual language models are selective and structurally constrained.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

A Computational Approach to Language Contact – A Case Study of Persian· underline

Taxonomy

TopicsLanguage and cultural evolution · Syntax, Semantics, Linguistic Variation · Categorization, perception, and language