Dispersion Measures as Predictors of Lexical Decision Time, Word   Familiarity, and Lexical Complexity

Adam Nohejl; Taro Watanabe

arXiv:2501.06536·cs.CL·January 14, 2025

Dispersion Measures as Predictors of Lexical Decision Time, Word Familiarity, and Lexical Complexity

Adam Nohejl, Taro Watanabe

PDF

Open Access 1 Repo

TL;DR

This study evaluates various dispersion measures as predictors of lexical decision time, word familiarity, and complexity across five languages, finding the logarithm of range superior to other measures and log-frequency.

Contribution

It provides an external validation of dispersion measures and identifies the logarithm of range as a more effective predictor than traditional log-frequency.

Findings

01

Logarithm of range outperforms log-frequency in predictions.

02

Logarithmic transformation improves predictor effectiveness.

03

Dispersion measures vary in predictive power across languages.

Abstract

Various measures of dispersion have been proposed to paint a fuller picture of a word's distribution in a corpus, but only little has been done to validate them externally. We evaluate a wide range of dispersion measures as predictors of lexical decision time, word familiarity, and lexical complexity in five diverse languages. We find that the logarithm of range is not only a better predictor than log-frequency across all tasks and languages, but that it is also the most powerful additional variable to log-frequency, consistently outperforming the more complex dispersion measures. We discuss the effects of corpus part granularity and logarithmic transformation, shedding light on contradictory results of previous studies.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

naist-nlp/tubelex
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeurobiology of Language and Bilingualism · Text Readability and Simplification · Second Language Acquisition and Learning