Relationship of the language distance to English ability of a country
Cao Xinxin, Lei Xiaolan, Murtadha Ahmed

TL;DR
This paper introduces a neural network-based semantic language distance measure and demonstrates its significant negative correlation with English proficiency across countries, especially in productive language skills.
Contribution
It proposes a novel semantic language distance metric using multilingual embeddings and empirically links it to variations in country-level English abilities.
Findings
Language distance negatively correlates with English proficiency.
The effect is stronger on speaking and writing skills.
Semantic language distance explains part of the variation in English ability.
Abstract
Language difference is one of the factors that hinder the acquisition of second language skills. In this article, we introduce a novel solution that leverages the strength of deep neural networks to measure the semantic dissimilarity between languages based on their word distributions in the embedding space of the multilingual pre-trained language model (e.g.,BERT). Then, we empirically examine the effectiveness of the proposed semantic language distance (SLD) in explaining the consistent variation in English ability of countries, which is proxied by their performance in the Internet-Based Test of English as Foreign Language (TOEFL iBT). The experimental results show that the language distance demonstrates negative influence on a country's average English ability. Interestingly, the effect is more significant on speaking and writing subskills, which pertain to the productive aspects of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSecond Language Learning and Teaching · Online Learning and Analytics
MethodsTest
