TL;DR
This paper demonstrates that native language cognates significantly influence second language lexical choices, enabling the reconstruction of language phylogenies from non-native English speech data.
Contribution
It introduces a large corpus and computational methods to quantify cognate effects on non-native lexical selection, revealing their impact on language use.
Findings
Cognate effects significantly shape non-native lexical choices
Lexical frequency patterns can reconstruct Indo-European language phylogeny
Cognate facilitation is a key factor in non-native language production
Abstract
We present a computational analysis of cognate effects on the spontaneous linguistic productions of advanced non-native speakers. Introducing a large corpus of highly competent non-native English speakers, and using a set of carefully selected lexical items, we show that the lexical choices of non-natives are affected by cognates in their native language. This effect is so powerful that we are able to reconstruct the phylogenetic language tree of the Indo-European language family solely from the frequencies of specific lexical items in the English of authors with various native languages. We quantitatively analyze non-native lexical choice, highlighting cognate facilitation as one of the important phenomena shaping the language of non-native speakers.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
