What makes a word hard to learn? Modeling L1 influence on English vocabulary difficulty

Jonas Mayer Martins; Zhuojing Huang; Aaricia Herygers; Lisa Beinborn

arXiv:2605.12281·cs.CL·May 13, 2026

What makes a word hard to learn? Modeling L1 influence on English vocabulary difficulty

Jonas Mayer Martins, Zhuojing Huang, Aaricia Herygers, Lisa Beinborn

PDF

TL;DR

This study models English vocabulary difficulty for learners with Spanish, German, or Chinese as their native language, highlighting the role of familiarity, orthographic transfer, and surface features in predicting difficulty.

Contribution

It introduces an interpretable, L1-specific modeling approach for vocabulary difficulty, incorporating cross-linguistic transfer and feature importance analysis.

Findings

01

Familiarity is the most important feature across all L1 groups.

02

Orthographic transfer influences Spanish and German learners' difficulty predictions.

03

Chinese learners' difficulty is mainly shaped by familiarity and surface features.

Abstract

What makes a word difficult to learn, and how does the difficulty depend on the learner's native language? We computationally model vocabulary difficulty for English learners whose first language is Spanish, German, or Chinese with gradient-boosted models trained on features related to a word's familiarity (e.g., frequency), meaning, surface form, and cross-linguistic transfer. Using Shapley values, we determine the importance of each feature group. Word familiarity is the dominant feature group shared by all three languages. However, predictions for Spanish- and German-speaking learners rely additionally on orthographic transfer. This transfer mechanism is unavailable to Chinese learners, whose difficulty is shaped by a combination of familiarity and surface features alone. Our models provide interpretable, L1-tailored difficulty estimates that can be used to design vocabulary…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.