Dispersion Measures as Predictors of Lexical Decision Time, Word Familiarity, and Lexical Complexity
Adam Nohejl, Taro Watanabe

TL;DR
This study evaluates various dispersion measures as predictors of lexical decision time, word familiarity, and complexity across five languages, finding the logarithm of range superior to other measures and log-frequency.
Contribution
It provides an external validation of dispersion measures and identifies the logarithm of range as a more effective predictor than traditional log-frequency.
Findings
Logarithm of range outperforms log-frequency in predictions.
Logarithmic transformation improves predictor effectiveness.
Dispersion measures vary in predictive power across languages.
Abstract
Various measures of dispersion have been proposed to paint a fuller picture of a word's distribution in a corpus, but only little has been done to validate them externally. We evaluate a wide range of dispersion measures as predictors of lexical decision time, word familiarity, and lexical complexity in five diverse languages. We find that the logarithm of range is not only a better predictor than log-frequency across all tasks and languages, but that it is also the most powerful additional variable to log-frequency, consistently outperforming the more complex dispersion measures. We discuss the effects of corpus part granularity and logarithmic transformation, shedding light on contradictory results of previous studies.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeurobiology of Language and Bilingualism · Text Readability and Simplification · Second Language Acquisition and Learning
