Measuring cross-language intelligibility between Romance languages with computational tools
Liviu P Dinu, Ana Sabina Uban, Bogdan Iordache, Anca Dinu, Simona Georgescu

TL;DR
This paper introduces a new computational metric to measure mutual intelligibility among Romance languages, using lexical similarity and semantic analysis, validated against human cloze test results.
Contribution
It presents a novel lexical and semantic similarity-based metric for estimating language mutual intelligibility, applied to five Romance languages and validated with human experimental data.
Findings
Scores align with intuitive intelligibility asymmetries.
Significant correlation with human cloze test results.
Effective across different language representations and corpora.
Abstract
We present an analysis of mutual intelligibility in related languages applied for languages in the Romance family. We introduce a novel computational metric for estimating intelligibility based on lexical similarity using surface and semantic similarity of related words, and use it to measure mutual intelligibility for the five main Romance languages (French, Italian, Portuguese, Spanish, and Romanian), and compare results using both the orthographic and phonetic forms of words as well as different parallel corpora and vectorial models of word meaning representation. The obtained intelligibility scores confirm intuitions related to intelligibility asymmetry across languages and significantly correlate with results of cloze tests in human experiments.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeurobiology of Language and Bilingualism · Reading and Literacy Development · Language Development and Disorders
